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The realization that all theories of 
behavior are based, either explicitly 
or implicitly, upon some conception 
of nervous system function has made 
it increasingly apparent that a con- 
sideration of the neural mechanisms 
associated with behavior assumes 
the necessary correlation of two 
classes of interdependent variables. 
Recognition of the relatedness of 
neurophysiology and psychology has 
been facilitated by an extension of 
the interest of neurophysiologists 
from static, reflex-like mechanisms 
to central systems with a plasticity 
and time-course more appropriate to 
behavioral events. It has, in turn, 
contributed to an awareness on the 
part of psychologists that common 
neural processes underlie many be- 
havioral phenomena which have been 
operationally defined in terms of in- 
dependent, mutually exclusive cate- 
gories. 

Research on central nervous sys- 
tem structures has repeatedly indi- 
cated that the reticular formation is 
critically involved in many psycho- 
logical functions. The literature in 
this area has become so extensive 
that some criterion must be utilized 
in the selection of topics for cover- 
age. in this review, that criterion 


1 This paper incorporates ideas worked out 
in discussion with Ausma Rabe. The author 
wishes to express her appreciation to E. L. 
Walker, C. J. Smith, and Ausma Rabe for 
their critical advice and assistance in the prep- 
aration of this manuscript. 
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has been “behavioral relevance” 
and the aspects of reticular function 
discussed are those considered par- 
ticularly germane to psychological 
phenomena. A brief review of the 
basic structural and functional char- 
acteristics of the reticular formation 
will therefore serve as introduction 
to the following topics: 


1. Interaction of specific and non- 
specific systems; 

2. Central control of afferent in- 
put; 

3. Cortical projections to the re- 
ticular system; 

4. The reticular system and the 
learning process. 


These areas are highly interrelated, 
and the decision to consider a par- 
ticular study in one category, rather 
than another, is, in many cases, quite 
arbitrary. 


ANATOMICAL AND PHYSIOLOGICAL 
PROPERTIES OF THE RETICULAR 
SYSTEM 


Extensive anatomical and physio- 
logical investigations confirm the 
highly differentiated organization of 
the reticular formation. Both struc- 
tural complexity and functional plas- 
ticity indicate its capacity to mediate 
a wide range of behavioral processes. 

The reticular formation may be di- 
vided into two functional systems 
the brain stem reticular formation 
and the diffusely projecting thalamic 
nuclei. The brain stem reticular for- 
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mation includes structures at the 
level of the medulla, pons, midbrain, 
subthalamus and _ hypothalamus 
(Magoun: 1950, 1952a, 1952b, 1954). 
The midbrain reticular formation oc- 
cupies a position of prime impor- 
tance within this system. The dif- 
fusely projecting thalamic nuclei 
(also referred to as the thalamic re- 
ticular system) include ventralis an- 
terior, centre median, nucleus re- 
ticularis, and the intralaminar nu- 
clei (Jasper: 1949, 1954; Jasper & 
Ajmone-Marson, 1952; Starzl & Ma- 
goun, 1951; Starzl & Whitlock, 1952). 
Both the brain stem reticular and 
the thalamic reticular systems, when 
activated, induce a desynchronization 
of re ‘sting alpha rhythms through- 
out the cortex. This electrophysio- 
logical ‘‘arousal’’ response is, in 
general, correlated with an alert con- 
scious state of the organism (Jasper: 
1949, 1954; Jasper & Ajmone-Mar- 
son, 1952; Magoun: 1952a, 1952b, 
1954). 

The behavioral effects which ac- 
company either stimulation or abla- 
tion of the two systems are varied. 
Stimulation of the midbrain reticular 
formation and of the centre median 
nucleus in the thalamus has been 
shown to result in the following se- 
quence of events: at low voltages of 
stimulation, a sleeping animal opened 
his eyes and reacted to auditory and 
visual stimuli; at a slightly higher 
voltage, the animal awoke and looked 
around searchingly in a puzzled man- 
ner; with further increases in inten- 
sity, there was abrupt arousal, crouch- 
ing, flight, fear, agitation, and finally 
frantic efforts to escape. Stimulation 
of the intralaminar nuclei in the 
awake animal produced an arrest re- 
action in which the animal became 
oblivious to sensory stimuli. This 
impairment of awareness and move- 
ment outlasted the duration of the 


stimulus (Hunter & Jasper, 1949). 
Slower frequency stimulation of these 
nuclei also produced sleep (Hess, 
1954). 

With the thalamic reticular system 
intact, lesions of the midbrain reticu- 
lar formation produced a chronically 
comatose, hypokinetic animé! which 
could not be aroused behaviorally. In 
these preparations, the EEG still 
showed an activation pattern to in- 
tense stimuli, but this activation did 
not outlast the period of application 
of the stimulus. This is in contrast to 
animals in which the brain stem re- 
ticular system was intact, but whose 
specific sensory projection paths had 
been transected. These animals gave 
evidence of both behavioral and elec- 
trophysiological arousal over sus- 
tained periods of time, even though 
the specific sensory impulses ‘ailed to 
reach the cortex. Lesions of the tha- 
lamic intralaminar nuclei have also 
been reported to produce lethargy, 
somnolence, and motor disability 
(French & Magoun, 1952; French, 
Von Amerongen, & Magoun, 1952; 
Hanberry & Jasper, 1953; Ingram, 
1952; Lindsley, Bowden, & Magoun, 
1949; Lindsley, Schreiner, Knowles, 
& Magoun, 1950). 

These studies illustrate a series of 
highly critical points. First, it is ap- 
parent that the cortical arcusal re- 
sponses induced by stimulation of the 
brain stem reticular system are inde- 
pendent of the specific sensory path- 
ways, since they persist after the lat- 
ter have been transected. Second, 
the arrival of specific sensory im- 
pulses in the cortex is not, in the ab- 
sence of nonspecific reticular activ- 
ity, a sufficient condition for the 
conscious perception of these im- 
pulses. Third, the interconnections 
between the diffuse thalamic nuclei 
and the cortex are not by themselves 
capable of preserving the waking 
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state beyond the immediate period of 
bombardment by afferent impulses 
from the periphery. Maintained 
wakefulness depends on the integrity 
of the brain stem reticular formation, 
since, in its absence, activation will 
not outlast the stimulus. However, 
the fact that even with the brain 
stem reticular formation destroyed, 
activation of the cortex by sensory 
stimuli for brief periods is still possi- 
ble, suggests that these stimuli also 
affect the thalamic reticular system. 

It has now been established that all 
sensory modalities, both interocep- 
tive and exteroceptive, give off col- 
laterals to both the brain stem and 
thalamic reticular systems. Thus, 
visual, auditory, olfactory, tactile, 
pain, proprioceptive, and _ visceral 
stimuli are all capable of activating 
both components of the reticular for- 
mation (Arduini & Moruzzi, 1953; 
Bremer, 1954; French, Verzeano, & 
Magoun, 1952; French, Verzeano, & 
Magoun, 1953; French, Von Amer- 
ongen, & Magoun, 1952; Morin, 
1953; Starzl & Magoun, 1951; Starzl 
et al., 1951a; Starzl et al., 1951b; 
Zanchetti, Wang, & Moruzzi, 1952). 
Auditory stimuli, for example, feed in 
at many levels from below the infe- 
rior colliculi in the midbrain as far for- 
ward as the posterior thalamus 
(Starzl et al., 1951b). The other sen- 
sory modes seem to have similar dis- 
persions. It is equally important to 
note that not only do collaterals from 
the specific paths enter the reticular 
formation at several points, but the 
reticular system also influences the 
specific sensory and motor pathways 
at many levels, either through direct 
collaterals or by affecting internun- 
cial neurones (Austin & Jasper, 1950; 
Lindsley, 1956; Magoun, 1950). A 
further source of reticular activation 
is provided by direct projections from 
certain cortical areas. These will be 


discussed at greater length in a subse- 
quent section of this paper. 

The collaterals from the specific 
sensory paths and the cortical pro- 
jections terminate upon both the 
brain stem reticular and the thalamic 
reticular neurones in a convergent 
pattern. It is common to find a 
single reticular unit responding to 
two or three sensory modes. How- 
ever, none of the cells recorded from 
could be fired by all types of stimuli 
(French & Herndndez-Peén, 1955; 
Hernandez-Peén & Hagbarth, 1955; 
Moruzzi, 1954; Scheibel, Scheibel, 
Mollica, & Morruzzi, 1955). Since 
both the latency and pattern of firing 
of a single unit vary for different loci 
of stimulation, the reticular cell is, to 
a certain extent, capable of “know- 
ing’’ its source of activation. 

The many similarities between the 
brain stem reticular formation and 
the diffuse thalamic nuclei should 
not obscure the differentiations which 
also exist. These will become more 
apparent upon a closer consideration 
of the functional and structural char- 
acteristics of the two systems. One 
of the most striking differences con- 
cerns the arousal response itself. Two 
types of activation patterns have 
now been distinguished (Sharpless & 
Jasper, 1955). The first of these, a 
‘“‘tonic’’ reaction, has been referred to 
the brain stem reticular system. 
This reaction varies in duration from 
a few seconds to many minutes, has 
a long latent period following the 
stimulus, is subject to rapid habitua- 
tion, and tends to recover slowly over 
periods of hours or days. The sec- 
ond, a “phasic” pattern, is presumed 
to be a function of the diffuse thalam- 
ic system. It rarely outlasts the 
stimulus by more than 10 or 15 sec- 
onds, has a short latency, is very re- 
sistant to habituation, and once 
habituated, recovers within a few 
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minutes. The resistance to adapta- 
tion of the phasic response would 
seem to be of special significance. 
By virtue of the fact that the arousal 
mediated by the thalamic nuclei is of 
short duration, it should continue to 
respond to repeated stimuli and thus 
to mediate a more differentiated at- 
tentional state to a stimulus after the 
first gross arousal induced by the 
brain stem reticular formation had 
adapted out. 

A comparison of their functional 
projections, both cortical and caudal, 
illustrates further distinctions be- 
tween the brain stem and thalamic 
reticular systems. The ascending 
brain stem reticular units are gen: 
erally assumed to be diffuse cortical 
activators, while the descending re- 
ticular projections are known to be 
more discrete in their action. Corti- 
cifugal units control the transmission 
of the specific evoked potential at 
many levels of all sensory projection 
paths (Galambos, 1956; Hagbarth & 
Kerr, 1954; Hernandez-Peén, Scher- 
rer, & Jouvet, 1956; Lindsley, 1956) 
and are capable both of facilitating 
and depressing activity in the motor 
pathways (Bernhaut, Gellhorn, & 
Rasmussen, 1953). There is also 
some structural localization within 
the brain stem reticular formation 
with respect to the reception of stim- 
ulating agents. Adrenaline, for ex- 
ample, has been shown to produce 
the cortical arousal response through 
its action upon a specific portion of 
the midbrain tegmentum. With this 
section of the tegmentum destroyed, 
adrenaline no longer had an effect 
(Rothballer, 1956). 

The discreteness of the brain stem 
reticular formation is particularly 
evident in its descending projections; 
the specificity of the thalamic retic- 
ular system, however, seems to be 
directed cephalically. Although the 


diffuse thalamic nuclei activate all 
regions of the cortex, including the 
primary sensory areas (Jasper, 1949; 
Jasper, 1954; Jasper & Ajmone- 
Marson, 1952) there is strong evi- 
dence of regional localization in their 
cortical projections. The medial 
thalamic nuclei project primarily to 
the anterior cortex, while the lateral 
nuclei activate the posterior portion. 
Stimulation of different points in the 
thalamic reticular system produces 
different patterns of activation in the 
cortex (Jasper, 1954; Jasper, Naquet, 
& King, 1955). In addition to their 
role in desynchronizing the cortex at 
high levels of stimulation (4.e., 100 
cycles per second), the diffuse thalam- 
ic nuclei are unique in their capac- 
ity to synchronize cortical rhythms 
at low frequencies of stimulation, 
(i.e., 10 cycles per second). Repeti- 
tive stimulation of the thalamic re- 
ticular areas at frequencies corre- 
sponding to those of the natural al- 
pha rhythms produces a cortical re- 
sponse of increasing amplitude—the 
so-called recruiting response: (Demp- 
sey & Morison, 1942; Jasper, 1954; 
Morison & Dempsey, 1942). This ex- 
perimentally produced _ response, 
which is independent of the specific 
sensory pathways, is assumed by Jas- 
per and his co-workers to involve the 
same neural mechanisms as the nat- 
urally occurring alpha rhythins (Jas- 
per, 1954). If this hypothesis is true, 
then both the recruiting response and 
the alpha rhythm area function of the 
regulatory control exercised by the 
thalamic system upon the: cortex. 
The possible roleof the diffuse thalam- 
ic nuclei in timing cortical rhythms 
is particularly relevant to psycholog- 
ical phenomena because of tthe im- 
portance of the slow alpha-like waves 
as regulators of spike discharge in 
the cortex. Evidence has been pre- 
sented of a fairly high, although not 
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invariant, correlation between the 
firing of the spike and the phase of 
the cortical slow waves (Gellhorn, 
Koella, & Ballin, 1954; Jasper, 1954; 
Li, Cullen, & Jasper, 1956a; Li, Cul- 
len, & Jasper, 1956b). This relation- 
ship, if valid, would suggest that the 
alpha-like slow waves are able to af- 
fect the discharge of the specific 
evoked potential and thus to influ- 
ence the transmission and elabora- 
tion of stimuli in the cortex (Jasper, 
1954). The possible implications of 
this regulatory function on behav- 
ioral processes such as attention, per- 
ception, and memory will be dis- 
cussed in later sections of this paper. 

This brief review of the properties 
of the reticular systems indicates 
that there are indeed differentiations, 
both structural and functional, be- 
tween its component parts—with 
the brain stem reticular system act- 
ing upon the cortex in a more global 
fashion than the diffuse thalamic 
nuclei. It would seem perfectly ap- 
propriate to equate the arousal func- 
tion of the brain stem reticular for- 
mation with a “generalized drive 
state’ (Hebb, 1955), since this sys- 
tem does possess the anatomical and 
physiological attributes (i.e., control 
of the level of activation of the organ- 
ism by virtue of its sensitivity to ex- 
teroceptive, interoceptive, hormonal, 
and cortical stimuli) which would 
enable it to fulfill the behavioral re- 
quirements of a drive concept. It is 
also apparent, however, that to cor- 
relate the brain stem reticular forma- 
tion uniquely with “drive” is both to 
limit its conceptual value unneces- 
sarily and to disregard its other func- 
tional characteristics (such as its role 
in the control of sensory input). The 
operational procedures which psy- 
chologists categorize as “reward”’ and 
‘“‘punishment” also serve to activate 
the organism and to narrow the be- 


havioral field. Reward and punish- 
ment, then, would appear to have a 
relation to reticular activity which is 
similar to that of drive. Perhaps 
as psychologists clarify the assump- 
tions underlying their concepts, 
many other supposedly independent 
categories previously regarded as 
mutually-exclusive will be recognized 
as functionally interrelated on the 
basis of a common factor of reticu- 
lar activation. 

Although the attributes of the 
brain stem reticular system qualify it 
as an appropriate neural substrate 
for general behavioral constructs 
such as “‘drive,’”’ the role of the diffuse 
thalamic nuclei would be obscured 
rather than clarified by such an 
equivalence. Reference has already 
been made to the more flexible opera- 
tion of this system—its ability to reg- 
ulate cortical excitability, the locali- 
zation of its cortical projections, its 
suppressive and facilitatory effects 
upon spike discharge, etc. A system 
such as this, functionally organized 
in a manner which would permit it to 
control the continuum of conscious- 
ness and to serve as a selective mech- 
anism for the facilitation of certain 
perceptions, sensations, and memo- 
well as the inhibition of 
others, would seem a rich source in- 
deed for the neural mechanisms 
which support highly differentiated 
behavior. 


ries, as 


INTERACTION BETWEEN SPECIFIC 
AND NONSPECIFIC SYSTEMS 

In his 1954 article on “Drive and 
the Conceptual Nervous System,” 
Hebb (1955) proposed a curvilinear 
relationship between drive or arousal, 
defined as the “level of nonspecific 
cortical bombardment through the 
ascending reticular system,” and cue, 
defined in terms of the cortical recep- 
tion and elaboration of the specific 
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sensory evoked potential, as they re- 
late to learning. In defining his axes 
both psychologically and neurophysi- 
ologically, Hebb has done three 
things: he has summarized the psy- 
chological literature indicating a cur- 
vilinear relationship between such 
variables as induced muscle tension 
and the learning of nonsense sylla- 
bles, memory span, etc.; he has pos- 
tulated a correspondence between 
physiological and psychological vari- 
ables which permits psychological 
concepts to be operationalized in 
terms of measurable neurophysio- 
logical variables; and he has focused 
attention upon the interaction of the 
specific and nonspecific systems as 
they affect such processes as learn- 
ing and memory. This attempt to re- 
late nonspecific input to learning 
would seem to be a natural out- 
growth of Hebb’s (1949) earlier 
“dual process’’ theory of memory 
and learning. If a certain period of 
reverberatory neural activity is an 
essential prerequisite to the forma- 
tion of the permanent structural 
trace upon which learning and mem- 
ory depend, then the level of non- 
specific input, as a critical factor de- 
termining the extent and duration of 
the elaboration of the specific evoked 
potential, should indeed bear a law- 
ful relationship to learning and mem- 
ory. 

That the interaction of specific 
and nonspecific activity may be rele- 
vant to perceptual processes, as well 
as learning and memory, is indicated 
by the evidence that the mere arrival 
of the afferent sensory volleys in the 
cortex is not sufficient to insure 
conscious sensation. Under deep an- 
esthesia, which depresses reticular 
activity (Magoun, 1954), the spe- 
cific evoked potentials appear in the 
sensory receiving areas in an en- 
hanced form under conditions which 


would preclude their conscious recep- 
tion (Gellhorn, 1954; Lindsley, 1956). 
It would appear, then, that the clas- 
sical afferent systems transmit the in- 
formation which forms the specific 
content of consciousness, but do not 
per se mediate awareness (Gellhorn, 
1954). Rather, it is activity in the 
nonspecific reticular systems which 
provides the essential neurophysio- 
logical condition for the processes of 
perception, attention, and sensation. 
If this viewpoint is valid, and if one 
is willing to accept the further as- 
sumption that the amplitude of the 
specific evoked potential under nor- 
mal (unanesthetized) conditions pro- 
vides an index of the extent to which 
sensory input is transmitted and 
hence perceived by the organism, 
then varying levels of nonspecific ac- 
tivity should, through their effect 
upon the specific evoked potential, 
be associated with changes in percep- 
tion. This hypothesis is supported 
by a study on humans utilizing both 
recordings of the visual specific 
evoked potential and phenomenal 
report, in which a decreased ampli- 
tude of the evoked potential was 
found to be correlated with a phe- 
nomenal report of decreased inten- 
sity of light (Hernandez-Peén & 
Donoso, 1957). Unfortunately, stud- 
ies correlating neurophysiclogical re- 
cordings and observer's reports are as 
yet rare in the literature. The data 
to be reviewed in this section are 
therefore primarily neurophysiolog- 
ical in nature and any psychological 
implications derived from them must 
be, to a large extent, based upon 
theoretical assumptions rather than 
direct experimental proof of neuro- 
physiological and behavioral equiva- 
lence. 

Anatomically, the specific and 
nonspecific pathways represent two 
distinct systems which are closely in- 
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terrelated at all levels. The reticular 
formation not only draws collaterals 
from the ascending sensory paths 
throughout its course (French et al., 
1953; Magoun: 1952, 1954; Starzl, 
et al.: 1951a, 1951b), but also feeds 
back into the sensory paths at sev- 
eral levels (Nakao & Koella, 1956). 
In the auditory projection pathways, 
for example, reticular stimulation has 
been shown to affect the size of the 
specific evoked potential at the coch- 
lea (Galambos, 1956), the dorsal 
cochlear nucleus (Herndndez-Peén, 
Jouvet, & Scherrer, 1957), and the 
medial geniculate (Nakao & Koella, 
1956). Within the cortex itself, at 
least two types of convergent organ- 
ization have been found to exist. In 
the sensory cortex, the specific and 
nonspecific afferent fibers terminated 
on different neurones. Cells which 
responded to the stimulation of non- 
specific afferents could not be fired by 
specific afferents, and vice versa. In- 
teraction occurred primarily through 
a series of interneurones which were 
facilitated by the nonspecific volleys 
and in turn affected the excitability 
of the specific elements (Li & Jasper, 
1953). In the visual cortex, some 
cells have been found to respond to 
both specific and nonspecific stim- 
ulation, while others were activated 
only by reticular volleys (Lindsley, 
1956). 

The interaction component con- 
tributed by the classical sensory sys- 
tems is spatially localized as a conse- 
quence of the high degree of topo- 
graphical representation which char- 
acterizes the projection patterns of 
these systems. With regard to the 
nonspecific structures, there is gen- 
eral agreement that the arousal elic- 
ited by the brain stem reticular sys- 
tem is of a diffuse nature (Jasper, 
1954; Magoun, 1954). However, the 
degree of cortical localization of the 


nonspecific thalamic reticular pro- 
jections has been a matter of con- 
troversy. Starzl et al. (Starzl & Ma- 
goun, 1951; Starzl et al., 1951a; 
Starzl & Whitlock, 1952) failed to 
find any degree of topographical lo- 
calization, while Jasper and his co- 
workers (Hanberry & Jasper, 1953; 
Jasper: 1949, 1954; Jasper & Ajmone- 
Marson, 1952) have repeatedly re- 
ported a definite organization within 
the thalamic nuclei, with the stimula- 
tion of different loci producing vary- 
ing patterns of cortical activation. 
This conflict appears to have been re- 
solved by the Jasper, Naquet, and 
King study (1955) in which it was 
determined that under appropriate 
anesthesia and with just-threshold 
intensities of stimulation, discretely 
localized patterns of cortical re- 
sponse could be induced by stimula- 
tion of the diffuse thalamic system. 
A structural basis for these results 
was provided by Chow’s work on re- 
gional degeneration within the tha- 
lamic reticular nucleus as a_ conse- 
quence of selective cortical ablations, 
which indicated that extirpation of 
each cortical projection area is fol- 
lowed by retrograde degeneration of 
both its specific thalamic relay nu- 
cleus and of a localized adjacent por- 
tion of the nucleus reticularis. Chow 
(1952) concluded, on the basis of his 
findings, that although the reticular 
nucleus taken as a whole may project 
to the entire cortex, ‘‘there is an or- 
derly arrangement of connections be- 
tween different sectors of the reticu- 
lar nucleus and different cortical 
fields.’”” There exists, then, anatom- 
ical provision for a selectively local- 
ized activation of such areas as the 
striate cortex, temporal lobe, audi- 
tory cortex, sensori-motor cortex, 


etc., by the diffuse thalamic nuclei. 
This viewpoint is also shared by Gas- 
taut (1954), whose extensive EEG 
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studies led him to conclude that in 
addition to specific fibers from the re- 
lay and association nuclei of the thal- 
amus, each cortical region received 
topographically organized nonspecif- 
ic fibers from the intralaminar and 
reticular nuclei of the thalamus. 
Whether the corticopetal fibers uti- 
lized by the thalamic reticular sys- 
tem are also shared by the brain 
stem reticular system, or whether the 
localized and diffuse arousal systems 
are independent of each other in their 
cortical projections remains to be de- 
cided. However, the functional value 
of a more differentiated arousal sys- 
tem capable of localized control of 
the specific projection areas is un- 
questionable. Through selective fa- 
cilitation or inhibition of various sen- 
sory inputs in the cortex, such an 
anatomical arrangement would pro- 
vide discriminative control over the 
elaboration of the specific sensory po- 
tentials at a cortical level and thus 
permit greater flexibility and a more 


finely graded regulation of processes 
involving selective awareness, per- 
ception, and memory than is possible 
through peripheral sensory control 
alone. 

Neurophysiological data on inter- 


action may be divided into three 
categories: (a) interaction between 
the two nonspecific systems; (6) in- 
teraction between the specific sen- 
sory systems in the reticular forma- 
tion; (c) interaction between the spe- 
cific and nonspecific systems. 

In studies of the relations between 
the two nonspecific the 
arousal response initiated by the 
brain stem reticular formation has 
been shown to block the cortical re- 
cruiting response evoked by the dif- 
fuse thalamic nuclei (Gauthier, Par- 
ma, & Zanchetti, 1956; Gellhorn, et 
al., 1954; Jasper et al., 1955). Wheth- 
er this was due to a direct desyn- 


systems, 


chronization of the thalamic nuclei 
by the brain stem reticular forma- 
tion, to a prepotent effect upon the 
cortical neurones by the brain stem 
system, or merely to an inability to 
distinguish the two effects electro- 
graphically is not certain (Morrell & 
Jasper, 1956). If the blocking effect 
is a true one, it would seem to suggest 
that gross activation or: arousal is 
inimical to the optimal functioning 
of the regulatory effects mediated by 
the thalamic reticular system. This 
overshadowing of the more differ- 
entiated functions of the thalamic 
nuclei by the diffuse arousal response 
of the brain stem may have its be- 
havioral counterpart in. the many 
failures of discrimination which occur 
under high emotion and excitement. 

Both facilitatory and innibitory in- 
teractions among the ascending sen- 
sory systems, as well as between the 
ascending systems and the cortici- 
fugal projections, have been demon- 
strated in the reticular formation it- 
self. Simultaneous convergence of a 
peripheral sensory impulse and a cor- 
ticifugal potential upon a reticular 
unit led to facilitation of the reticular 
response, while a reticular neurone 
which had been fired by either a pre- 
ceding corticifugal or a peripheral 
sensory volley failed to respond to a 
subsequent sensory stimulus. It was 
noted that the depression of reticular 
response following repetitive cortical 
or afferent stimuli was particularly 
severe. Because of the density of re- 
ticular neurones, facilitatory field ef- 
fects occurred in neurones near those 
which were directly fired. These 
subliminal fringe areas then gave rise 
to summated potentials when fired 
by subsequent test stimuli (Amas- 
sian & Devito, 1954; Herndndez- 
Peén & Hagbarth, 1955: Moruzzi, 
1954; Murphy & Gellhorn, 1945). 
The large degree of convergence of 
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afferent and cortical projections upon 
reticular units, and the suppressive 
and facilitatory effects to which this 
competition gives rise, produces an 
area of sensory and cortical interac- 
tion to which simple input-output 
conceptions of nerve impulse trans- 
mission are not applicable. Reticular 
units may exhibit input without out- 
put (due to suppressive and occlusive 
effects), as well as output without in- 
put (due to a summation of spon- 
taneous activity), and thus may 
serve both integrative and _ pace- 
making functions (Fessard, 1954). 
The full behavioral consequences of 
these properties, particularly with re- 
gard to levels of awareness and sen- 
sory interaction, remain to be inves- 
tigated. 

Interaction between specific and 
nonspecific systems within the cor- 
tex has also produced varied and 


complex phenomena depending upon 
the timing, intensity, and loci of 


stimulation. The generalized cortical 
activation produced by a single shock 
delivered to a diffuse thalamic nu- 
cleus had a facilitatory effect upon 
the responsiveness of the areas in- 
volved to subsequent specific sensory 
volleys (Li & Jasper, 1953). How- 
ever, with rapid and intense stimula- 
tion of the thalamic reticular system, 
both the secondary surface-negative 
wave of the specific evoked potential 
(which is dependent upon the activa- 
tion of the cortical apical dendrites) 
and the repetitive after-discharges 
(which are transmitted via thalamo- 
cortical reverberatory circuits) were 
abolished (Jasper: 1949, 1954; Jasper 
& Ajmone-Marson, 1952). Thus, the 
diffuse thalamic nuclei seemed cap- 
able both of facilitating the reception 
of the specific sensory impulses in the 
cortex, as indicated by the increase in 
the number of spike discharges to a 
stimulus, and of suppressing the 


elaboration of these afferent impulses 
through cortical and thalamo-cortical 
circuits. 

Varied cortical effects have also 
been induced by brain stem reticular 
stimulation. On the one hand, the 
primary evoked potentials elicited by 
excitation of a peripheral nerve were 
reduced or blocked in the cortex by 
intense reticular arousal (Gauthier, 
et al., 1956). On the other, hypo- 
thalamic stimulation interacted with 
the specific sensory impulse to give 
both an intensification of the ampli- 
tude of the cortical evoked response 
and an increase in the area from 
which it was recorded (Gellhorn, et 
al., 1954). This interaction of hypo- 
thalamic stimulation with a specific 
sensory stimulus occurred chiefly 
within the corresponding projection 
area (i.e., hypothalamic interaction 
with an acoustic stimulus affected 
the auditory projection area), but, to 
a lesser extent, it increased the re- 
sponse of another region (i.e., the vis- 
ual area) to stimulation. The many 
apparent discrepancies in results 
from stimulation of the brain stem 
reticular formation may well be due 
to differences in the levels of anes- 
thesia and intensities of stimulation 
utilized, for both are critical factors 
in inducing variability of response. 
To cite an example, the generalized 
reticular activation induced by mild 
nociceptive stimuli enhanced the size 
of the primary auditory and _ vis- 
ual potentials under light anesthesia. 
When depth of anesthesia was in- 
creased, a depression of these re- 
sponses occurred (Bernhaut, et al., 
1953). 

The extensive research of Gellhorn 
has been instrumental in clarifying 
many of the complexities associated 
with the interaction of specific and 
nonspecific impulses (Bernhaut, et 
al., 1953; Gellhorn: 1952, 1954; Gell- 
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horn, et al., 1954; Nakao & Koella, 
1956). A key finding has been the 
ordering of the various sensory modes 
with respect to their effectiveness in 
inducing cortical activation (Bern- 
haut, et al., 1953). Nociception and 
proprioception have been found to 
induce the most intense and wide- 
spread cortical activation, with audi- 
tory and visual stimuli producing the 
least. Gellhorn (Bernhaut, et al., 
1953) also reported two kinds of cor- 
tical activation patterns in response 
to stimulation. The first, a general- 
ized arousal throughout all areas of 
the cortex, which was accompanied 
by excitation in the hypothalamic 
portion of the brain stem reticular 
system, occurred mainly to nocicep- 
tive and proprioceptive stimuli. The 
second, a specific activation pattern, 
with excitation confined to the spe- 
cific sensory projection area, was not 
accompanied by hypothalamic exci- 
tation in most instances. It was given 
predominantly by visual and audi- 
tory stimuli. The comparative order 
of effectiveness of the sensory modes 
in their ability to excite the reticular 
formation is particularly interesting 
when considered in relation to evi- 
dence concerning the relative ease of 
establishing both electrographic and 
behavioral conditioned responses to 
auditory as opposed to visual stimuli. 
Chow, et al. (1957), for example, have 
reported that an avoidance response 
which was established in 450 trials to 
light as the CS required only 150 
trials for tone. Morrell and Jasper 
(1956) gave a similar order of diff- 
culty for conditioned alpha flicker re- 
sponses: the mean for visual CR’s was 
13.2 trials; for auditory, 8.2. A par- 
tial explanation of these results may 
lie in the finding that auditory stimuli 
are, in general, more effective activa- 
tors of the reticular formation and 
hence of the cortical projection and 


elaboration areas than are visual 
stimuli (Bernhaut, et al., 1953). If 
the formation of the memory trace is 
dependent upon a certain level of non- 
specific input, then a class of stimuli 
with a strong inherent capacity for 
providing its specific sensory com- 
ponents with a high level of reticular 
activation would be functionally 
equipped to produce faster learning 
than a class of stimuli dependent up- 
on random external sources for its 
elaboration. 

Direct experimental evidence of 
the effect of interaction between the 
specific and nonspecific systems on 
the learning process is sparse. Chiles 
(1954) has reported that stimulation 
of the diffuse thalamic nuclei in- 
creased variability in a discrimina- 
tion task, and Gengerelli and Cullen 
(1955) have presented some evidence 
of increased learning as a result of 
stimulation of presumably cortical 
structures. The most impressive 
work to date has been that of Mahut 
(1957). Rats were run in the Hebb- 
Williams maze under hunger motiva- 
tion for 10 trials a day. Immediately 
following each trial, they were stim- 
ulated for 15 seconds while eating in 
the goal box with .25 volt, 60 cycle 
current, delivered through bipolar 
electrodes in the intralaminar and 
midline thalamic nuclei. No visible 
disruption of feeding behavior oc- 
curred with this procedure. How- 
ever, there was a highly significant 
impairment in learning for the experi- 
mental rats as compared to control 
animals. Additional controls indi- 
cated that there was no difference in 
the trial latencies for the two groups, 
and that stimulation itself carried 
neither pleasurable nor aversive qual- 
ities since the experimental animals 
gave only spontaneous bar pressing 
rates when tested for self-stimulation 
in the Skinner Box. Mahut has inter- 
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preted her results as indicating an in- 
terference with the neural circuits in- 
volved in the memory trace. 

The relatively small currents which 
were utilized in Mahut’s study, and 
the striking decrements which were 
produced by stimulation immediately 
following a trial suggest the possi- 
bility that massive peripheral stim- 
uli capable of activating the reticular 
system to a similar extent might also 
produce decrements in memory if ap- 
plied during the critical time inter- 
val. The massive nonspecific input 
associated with severe punishment, 
for example, may affect learning by 
disrupting the consolidation process 
of the preceding response. In a mul- 
tiple choice situation with punish- 
ment for errors, this would have the 
effect of disrupting the neural trace 
of the immediately antecedent in- 
correct response, while permitting 
the consolidation of the alternative 
traces to proceed comparatively un- 
affected. This schema would also ac- 
count for the relative efficiency of 
spaced as opposed to massed learning 
under punishment. If the neural 
trace is most vulnerable to interfer- 
ence during a certain limited critical 
interval, then massing trials would 
increase the probability of exposure 
of the trace associated with the non- 
punished response to the disruptive 
effects of high levels of nonspecific 


input. These speculations are given 
some support in a study by Duncan 
(1949), who administered traumatic 
shocks of 85 volts to the hind legs of 


rats at intervals of 20 seconds, 60 
seconds, 4 minutes, and 45 minutes 
after one trial per day in an avoid- 
ance box. At the end of 18 days, the 
20-second group showed a significant 
impairment in learning which was of 
a magnitude similar to that of an ex- 
perimental group given electrocon- 
vulsive shock of equal voltage. The 


other three groups showed no signifi- 
cant decrement. 

The precise perceptual correlates 
of interaction also remain ambigu- 
ous. A possible relationship between 
levels of reticular activation and per- 
ceived brightness is suggested by evi- 
dence that stimulation of the mid- 
brain reticular formation led to a 
great facilitation of the response of 
individual retinal units to a test flash 
of light (Granit, 1955). This facili- 
tation involved both an increase in 
impulse frequency and an extended 
duration of discharge. Since impulse 
frequency is directly related to stim- 
ulus intensity, which is correlated 
with perceived brightness, level of 
reticular activation may be one of the 
determinants of perceived brightness. 
A study by Fuster (1958) lends some 
support to this hypothesis. Monkeys 
stimulated in the midbrain reticular 
formation made a higher percentage 
of correct and showed 
shorter reaction times in a discrimina- 
tion problem involving the presenta- 
tion of stereometric objects at expo- 
sures ranging from 10-40 milliseconds 
than did the control group. However, 
the failure to control for pupillary re- 
sponse, the absence of recordings of 
the specific evoked potential in the 
specific sensory tracts and cortical 
receiving areas, and the application 
of the electrical stimulus during the 
response interval, make it impossible 
to decide whether the facilitation is a 
function of the peripheral receptor, 
the central organization of the per- 
ception, or the motor response. 

Sensory deprivation and _ photic 
driving studies have also provided 
rich data for conjecture. Heron, 
Doane, and Scott (1956) noted that 
their eight Ss who reported hallucina- 
tions during isolation showed EEG’s 
containing slow frequency delta waves 
both during and after the isolation 


responses 
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period. Hallucinatory activity as a 
consequence of photic driving has 
also been a frequent occurrence, and, 
in one case, at least, these vivid vis- 
ual images were accompanied by 
high voltage, irregular slow waves at 
a frequency from 4-8 cycles per sec- 
ond in the temporal and temporo- 
occipital area (Mundy-Castle, 1953). 
These studies seem to indicate a rela- 
tionship between low level nonspecific 
input, either as a consequence of lack 
of sensory input or from synchronous 
“driving” stimuli, and hallucinatory 
activity. Whether the hallucinatory 
activity which is coincident to so 
many psychotic states is also a result 
of low level, relatively synchronized 
nonspecific activity, perhaps induced 
by centrally initiated sensory cutoff, 
remains a problem for further inves- 
tigation. 

The studies reviewed in this sec- 
tion provide strong neurophysiologi- 
cal evidence of interaction between 


the specific sensory and nonspecific 
reticular systems. There can be little 
doubt that the size and frequency of 
the specific evoked potential are af- 
fected by activity in the nonspecific 


Despite the 
stimulating speculations of Hebb 
(1949; 1955) and Gellhorn (1952; 
1954), however, the actual effects of 
this interaction upon behavioral phe- 
nomena such as attention, percep- 
tion, memory, etc., remain essentially 
unknown and constitute a challenge 
to the ingenuity and perseverance of 
psychologists. 


activating systems. 


CENTRAL CONTROL OF AFFERENT 
INPUT 

The restrictive and selective nature 
of attentional processes has long been 
recognized by psychologists and psy- 
choanalysts alike. Proponents of 
nonreinforcement theories of learn- 
ing, for example, have attempted to 
explain the failure of “‘latent’’ learn- 


ing under conditions of strong drive 
by arguing that the irrelevant incen- 
tive was not perceived under high 
motivation (Thistlethwaite, 1951), 
while psychoanalytic theorists have 
proposed the existence of a “stimulus 
barrier,’’ which permits the organism 
to protect itself against traumatic 
stimuli by shutting off the function 
of perception (Fenichel, 1945). The 
operation of these mechanisms has 
been generally assumed to occur 
either prior to the impingement of the 
stimulus upon the peripheral re- 
ceptor (i.e., the animal did not “look 
at’’ or ‘see’ the relevant discrim- 
inanda), or subsequent to its arrival 
in the higher centers of the brain 
(i.e., the organism did not ‘‘pay at- 
tention” to what it ‘“‘saw’’). In other 
words, information transmitted by 
the specific pathways was thought to 
remain constant throughout its course 
subsequent to its reception at the 
peripheral sense organ and prior to 
its elaboration in the cortex. That 
nondecremental transmission of in- 
put is far from universal has been 
consistently demonstrated by recent 
work on the centrifugal regulation of 
afferent influx, which indicates that 
sensory impulses may be regulated 
and controlled at every level from the 
receptor upward. 

Central control of afferent impulses 
at a higher nervous system level was 
first demonstrated by Granit and 
Kaada (1952) with respect to the 
muscle spindle—a proprioceptive re- 
ceptor. Both facilitation and inhibi- 
tion of discharge were obtained by 
stimulation of gamma efferent fibers 
through the brain stem reticular for- 
mation (Granit & Henatsch, 1956). 
Facilitatory and inhibitory effects 
have also been reported in retinal 
ganglion cells and in the optic tract 
as a result of reticular stimulation 
(Dodt, 1956; Granit: 1955a, 1955b). 
In the spinal cord, stimulation of the 
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bulbar and midbrain reticular forma- 
tion, the anterior cingulate gyrus, and 
the sensori-motor cortex depressed 
or inhibited conduction in both the 
dorsal and ventral columns, which in- 
clude fibers mediating both kines- 
thetic and cutaneous pressure (Hag- 
barth & Kerr, 1954). Tactile impulses 
were also depressed for periods as long 
as 80 seconds at the level of the gracile 
nucleus of the medulla by stimula- 
tion of reticular areas and the sensori- 
motor cortex (Scherrer & Hernadndez- 
Peén, 1955). That these effects may 
be obtained from the sensori-motor 
area as well as the anterior cingulate 
gyrus indicates that cortical projec- 
tions to the reticular formation are 
capable of initiating the regulation of 
specific evoked potentials, as well as 
of blocking conduction of nonspecific 
impulses within the reticular forma- 
tion itself (Adey, Segundo, & Living- 
ston, 1957). Transmission in the 


olfactory and auditory pathways was 


also found to be subject to central 
regulation and control. Excitation 
of the amygdala, the prepyriform 
cortex, and the anterior commissure 
exerted a depressive influence on elec- 
trical activity in the olfactory bulb 
(Kerr & Hagbarth, 1955), and stimu- 
lation of the bulbar reticular forma- 
mation suppressed the response of 
the cochlea to auditory stimuli (Ga- 
lambos, 1956). It would appear, 
then, that all sense modalities have 
some means of centrifugal control, 
either at the level of the receptor it- 
self, at the first or second synapses, 
or at some more centrally located sta- 
tion along the afferent pathways 
(Lindsley, 1956). 

That these suppressive and facili- 
tatory effects are not merely artifacts 
induced by unphysiological electrical 
stimulation has now been shown by 
experiments utilizing natural stimuli 
and unanesthetized animals with 
chronically implanted electrodes in 


the sensory projection paths. Her- 
nandez-Peén, Scherrer, and Jouvet 
(1956), recording the responses of the 
dorsal cochlear nucleus to auditory 
clicks, reported that these specific 
evoked potentials were practically 
abolished when a visual stimulus— 
two mice in a closed bottle—elicited 
behavioral evidence of attention from 
the cat. When the mice were re- 
moved, the auditory responses re- 
turned to the same order of magni- 
tude as the original control responses. 
Similarly, olfactory stimuli and a 
nociceptive shock, which apparently 
distracted the animal’s attention, re- 
sulted in a reduction of auditory 
potentials in the cochlear 
nucleus. If it is valid to assume that 
subjective awareness of a stimulus is 
contingent upon the transmission of 
its concomitant evoked potential 
through the specific projection path- 
ways to higher diencephalic and cor- 
tical centers, then it is possible to con- 
clude that the cat’s ‘‘hearing”’ of the 
click was disturbed when it was dis- 
tracted by other stimuli. Photically 
evoked potentials were also reduced 
or abolished when the animal focused 
on an acoustic or olfactory stimulus 
(Hernandez-Peén, Guzman-Flores, 
Alcaraz, Fernandez-Guardiola, 1957). 
This reduction in the magnitude of 
the visual evoked potential occurred 
both within the specific sensory path- 
ways (i.e., the optic tract, lateral ge- 
niculate body, and striate cortex) and 
in the midbrain reticular formation. 
Since it occurred at a level peripheral 
to the optic tract, the blocking effect 
was assumed to take place in the 
retina. A similar blockade of photic 
potentials was observed when atten- 
tive behavior was elicited by stimu- 
lating the brain stem reticular forma- 
tion. 

A further correlation between af- 
ferent signals and conscious sensation 
has been reported by Hernandez- 


evoked 
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Peén and Donosa (1957) with human 
Ss. Four patients with electrodes 
chronically implanted in the occipital 
lobes were subjected to a series of 
flashes of constant intensity. In gen- 
eral, the magnitude of the evoked po- 
tentials which were recorded varied 
with the reported perception of the 
intensity of the light. When the pa- 
tients’ attention was engaged by the 
presentation of such mental tasks as 
arithmetic problems or instructions 
to recall visual imagery, the visual 
evoked potential was reduced or 
abolished. The potentials recovered 
their original size after the comple- 
tion of the task. 

Analysis of the data on regulation 
of sensory input (including that re- 
viewed in the previous section on in- 
teraction of specific and nonspecific 
impulses) indicates that there are 
mechanisms available for three types 
of control of input: 

1. At the level of the sensory re- 
ceptor, the spinal cord, and in the 
specific sensory paths prior to the 
point at which they give off collat- 
erals to the reticular formation, both 
the arousal and the cue effects of 
stimuli may be controlled. 

2. In the reticular formation itself, 
the arousal effects of stimuli may be 
enhanced or inhibited. 

3. In the cortex, the cue value of 
stimuli may be affected. 

These mechanisms of sensory con- 
trol provide a_neurophysiological 
basis for phenomena such as the re- 
pressive defences, concentration, hys- 
terical anesthesias, etc., which in- 
volve a selective restriction of sensory 
input in their operation. They also 
indicate that the interpretation of 
cortical events must be undertaken 
with caution in the absence of record- 
ings of afferent influx to the cortex. 

This section has dealt with central 
control of afferent input at a reticular 


level. However, the complex discrim- 
inative behavior of interest to psy- 
chologists is generally assumed to in- 
volve the cortex. It is highly rele- 
vant, therefore, to inquire to what ex- 
tent cortical areas may exert an ef- 
fect upon the reticular formation and 
its regulatory mechanisms and thus 
participate in the control of periph- 
eral sensory input. The following 


section will review the existing litera- 
ture which bears on this question. 


CoRTICAL PROJECTIONS TO THE 
RETICULAR SYSTEM 


The tendency of many behavior 
theories to base their motivational 
constructs upon the so-called “pri- 
mary” biological needs (Hull, 1943), 
to the exclusion of autonomous cog- 
nitive processes, has been subjected 
to increasing criticism of late. This 
revival of interest in cognitive moti- 
vation by psychologists has found 
ample support from neurophysiology, 
where investigations of cortical pro- 
jections to the reticular formation 
strikingly demonstrate the fallacy of 
theories which categorize higher men- 
tal functions as subordinate deriva- 
tives of more “‘basic’’ need states. 

Within the past few years, the im- 
portance of cortical projections to the 
brain stem reticular formation and 
the diffuse thalamic nuclei has been 
repeatedly confirmed by __ studies 
which indicate that the potentials in- 
duced throughout the reticular sys- 
tems by cortical projections are 
larger, more widespread, and have a 
shorter latency than those evoked 
directly by any sensory mode (Her- 
nandez-Peén & MHagbarth, 1955). 
These cortical connections to the 
reticular formation arise only in cer- 
tain limited regions: the frontal ocu- 
lomotor area, the orbital surface of 
the frontal lobe, the sensori-motor 
cortex, the superior temporal gyrus, 
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the cingulate gyrus, and the hippo- 
campal gyrus (French & Hernandez- 
Peén, 1955; French, Livingston, & 
Hernandez-Peén, 1953; Jasper, et al., 
1952; Livingston, French, & Hernan- 
dez-Peén, 1953; Segundo, Naquet, & 
Buser, 1955). Stimulation of the last 
three areas has resulted in the most 
intense and widespread arousal of all 
points explored (Segundo, Naquet, & 
Buser, 1954), and it is particularly 
interesting to note that these struc- 
tures are currently believed to be 
critical to the elaboration of emo- 
tional and memory processes (Jasper, 
Gloor, & Milner, 1956; Lindsley, 
1956). 

The importance of these cortical 
connections can scarcely be overem- 
phasized, for they provide a means 
whereby the cortex can control the 
activating mechanisms of the brain 
stem and thus influence its own level 
of arousal (French & Hernandez- 
Peén, 1955). This effect is particu- 
larly relevant to sleep phenomena 
(Bremer, 1954) and to the role of 
learned stimuli in directing behavior. 
The efficacy of cortical processes in 
inducing wakefulness has been con- 
firmed by studies in which threshold 
electrical stimulation of areas with 
projections to the reticular formation 
aroused a sleeping animal and pro- 
duced cortical desynchronization just 
as effectively as an intense peripheral 
sensory stimulus (Segundo, Arana, & 
French, 1955). Of even greater im- 
port to behavior is the role of cortical 
projections in providing a mediating 
mechanism whereby learned, mean- 
ingful stimuli may influence the or- 
ganism's activity in the waking state. 
That this influence is a powerful one 
is evident even in the behavior of 
relatively ‘‘ungifted” animals. Thus, 
the appearance of a human being may 
come to elicit a far more consistent 
and intense arousal from the rabbit 


than strong sensory stimuli such as 
loud noises and bright lights (Gan- 
gloff & Monnier, 1956). 

Although corticifugal fibers to the 
reticular formation originate in dis- 
crete cortical areas, they terminate in 
overlapping projection areas within 
the reticular system (Amassian & 
Devito, 1954; Herndndez-Peén & 
Hagbarth, 1955; Moruzzi, 1954; 
Scheibel, et al., 1955). There is, 
therefore, an extensive degree of con- 
vergence of both cortical and afferent 
impulses upon individual reticular 
units. However, single unit analysis 
of reticular neurones has indicated 
that individual cells respond with dif- 
ferent patterns and latencies of firing, 
depending upon the source of stimu- 
lation (Amassian & Devito, 1954; 
Herndndez-Peén & Hagbarth, 1955). 
Information may also be conveyed 
by differential patterns of inactive as 
well as active units within the system 
as a whole (Adey, et al., 1957; French 
& Herndndez-Peén, 1955; Hernan- 
dez-Peén & Hagbarth, 1955). These 
findings lend substance to the con- 
clusion that “the identification of 
unique factors corresponding to each 
corticifugal path impels one to leave 
open the possibility that the tem- 
poral configuration of activity may 
provide a code for specificity of in- 
formation conveyance even to a dif- 
fusely projecting system’”’ (Adey, et 
al., 1957). 

Comparison of transmission laten- 
cies in the specific and nonspecific 
systems indicates that impulse veloc- 
ities are faster in the specific sensory 
pathways than in the reticular for- 
mation. For example, the latency of 
the specific evoked potential in the 
sensori-motor cortex following stimu- 
lation of the sciatic nerve was 9-10 
msec. In the midbrain reticular for- 
mation, the conduction times ranged 
from 13-23 msec. (French, et al., 
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1953). This difference of 4-14 msec., 
when compared to the 6-12 msec. la- 
tencies of potentials from cortical 
areas to the reticular formation 
(French & Herndndez-Peén, 1955), 
suggests that there is time for a 
stimulus to reach the cortex via the 
specific paths and then relay down to 
the reticular formation in time to af- 
fect its own arousal properties. Ex- 
perimental data in support of this 
supposition have been presented by 
Ingvar and Hunter (1955), who re- 
corded brain stem responses to optic 
stimuli in intact cats and in animals 
with chronic bilateral ablations of the 
visual cortex. In the ablated prep- 
arations, it was possible to trace a 
central diencephalic pathway for 
light responses which coincided with 
the thalamic reticular system. The 
mean latency of these responses was 
35 msec., with a range from 28-45 
msec., indicating a slow conduction 
velocity. Although intact prepara- 
tions also showed responses to visual 
stimulation in the diencephalon, no 
distinct central pathway was found 
within the thalamic reticular system 
and the latencies of these brain stem 
potentials were generally in the range 
of 20 to 30 msec. The authors have 
interpreted these shorter latency 
brain responses in the intact animals 
as due to corticifugal influences from 
the visual cortex on the brain stem. 
From a study of the time relations 
involved, they believed it to be possi- 
ble for impulses from the occipital 
areas to influence brain stem poten- 
tials initiated by direct optic collat- 
erals at the pretectal and collicular 
levels, and for the cortex thus to con- 
trol events elicited by visual stimuli 
in the nonspecific pathways of the 
brain stem. 

The potential extent of this con- 
trol has been strikingly demon- 
strated by Adey, Segundo, and Liv- 


ingston (1957), who reported that 
stimulation of the cortical areas pre- 
viously listed (the hippocampal gyrus 
and the temporal gyrus being the 
most effective of these) blocked 
conduction in the reticular formation 
between the midbrain and the thala- 
mus. In the region of the hippo- 
campal gyrus, for example, single 
cortical shocks induced profound 
blocking interaction in the reticular 
formation lasting for two seconds. 
These two studies should be of 
particular interest to psychologists, 
for not only do they provide a neuro- 
physiological basis for phenomena 
involving aberrations of conscious- 
ness and memory under emotional 
stress, but they strongly indicate the 
critical role of the cortex, with its 
highly discriminative properties, in 
the selection and transmission of 
sensory input. If the perception of 
complex stimuli requires extensive 
supportive elaboration from nonspe- 
cific sources in order to be retained 
as conscious memory, and if reticular 
input can be blocked during this pe- 
riod of consolidation by a discrimina- 
tive center which is capable of moni- 
toring its own input, then theoretical 
constructs such as ‘‘subception,” 
“perceptual defense,’ and “repres- 
sion’’ may have more validity than 
their critics have yet been prepared 
to admit. Repression, for example, 
has been conceptualized in terms of 
two components—a withdrawal or 
expulsion from consciousness of the 
ideational representation of the dan- 
gerous impulse, and a “warding off” of 
any external stimulus which, by asso- 
ciation with the repressed thought, 
would restore it to consciousness 
(Fenichel, 1945). Neurophysiologi- 


cally, it may well be possible to dis- 
turb ideation by a blocking of reticu- 
lar conduction which would produce 
a transitory change in the level of 
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consciousness, while the reception 


and transmission of external sensory 


stimuli could be disrupted by reticu- 
lar control of peripheral afferent in- 
put within the specific sensory paths. 
These suppositions are highly specu- 
lative, but do serve to illustrate some 
possible applications of reticular 
mechanisms to behavioral phenom- 
ena. 


THE RETICULAR SYSTEM AND THE 
LEARNING PROCESS 


Although isomorphism between be- 
havior and brain processes is not an 
essential condition of their interrelat- 
edness, evidence of the same varia- 
bility and plasticity which charac- 
terize behavior has been sought by 
theorists in the neural systems which 
supposedly underlie this behavior. It 
is of interest then to inquire: (a) to 
what extent the reticular formation is 
involved in the learning process; (0) 
to what degree its activity is capable 
of modification; (c) whether changes 
in reticular activity are initiated 
within the reticular system itself, or 
whether they are controlled by higher 
centers. 

Many of the studies seeking to in- 
vestigate the relationship of brain 
activity to learning have utilized con- 
ditioning of the alpha rhythm as 
their operational technique. The pro- 
cedure for alpha conditioning in- 
volves the pairing of a conditioned 
stimulus, either auditory, visual, or 
tactile, with the unconditioned stim- 
ulus of a flickering light. The uncon- 
ditioned response is a blocking or de- 
synchronization of the alpha rhythm 
to high frequencies of stimulation, or 
a photic driving at the frequency of 
the flashing light for stimuli between 
six and twelve cycles per second. 
Morrell and Jasper (1956) have found 
that following a period of adaptation 
to the CS, at which time its presenta- 


longer evoked recordable 
electrical response, conditioning to 
the paired CS and US occurred in 
three stages: an initial generalized 
blocking or desynchronization to the 
CS which appeared simultaneously 
throughout the cortex; an interven- 
ing phase of localized discharge 
which was frequency-specific to the 
unconditioned stimulus; and a final 
stage of desynchronization which 
was mainly localized in the occipital 
cortex, and, to a lesser extent, in the 
surrounding parietal and posterior 
temporal areas. These changes were 
specific to the particular conditiones| 
stimulus utilized and, once estab- 
lished, did not generalize either 
within or among sensory modes. 
Electrographic conditioning  ex- 
periments have presented evidence of 
highly consistent and characteristic 
changes in alpha following the pres- 
entation of a CS. The methodologi- 
cal similarity of this type of condi- 
tioning to behavioral learning is not, 
however, sufficient basis for assum- 
ing that conditioned flicker 
charges are the neural equivalents of 
behavioral responses. In a procedure 
involving two training stages, Chow, 
Dement, and John (1957) first condi- 
tioned three adult cats to perform an 
avoidance response in a double grill 
box with flickering light as the CS 
and an electric shock as the US. The 
CS evoked photic driving in the 
electrocorticogram and the US forced 
the cats to cross over to another com- 
partment in the box. After repeated 
paired presentations, the flicker by 
itself elicited both the ECG repeti- 
tive discharge and the _ behavioral 
crossing. The cats were then trained 
to a conditioned ECG response in an 
animal holder, with a tone as CS and 
the flickering light as US. After the 
cats acquired both these CRs, they 
were returned to the double grid box 


tion no 


dis- 
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to test whether the conditioned ECG 
discharge would be associated with 
the behavioral response. Tone alone 
was presented: it evoked the ECG 
change but did not lead to behavioral 
crossing. The authors concluded 
that ‘Establishment of partial equiv- 
alence between two different stimuli 
via cortical conditioning, using more 
or less similar electrical responses to 
both stimuli as the indicator, is not 
sufficient to establish overt behav- 
ioral equivalence between these stim- 
uli, using the manifestation of the 
conditioned avoidance response as 
the behavioral criterion.” 

Despite an apparent lack of equiv- 
alence between the electrical activity 
of the brain, as recorded by gross 
EEG methods, and behavioral re- 
sponses, electrographic conditioning 
does provide a valuable tracer mech- 
anism for following and analyzing the 
functional changes in brain activity 
which accompany the learning proc- 
ess. Utilizing recordings of condi- 
tioned electrical activity from corti- 
cal and subcortical structures, Yoshii, 
Pruvot, and Gastaut (1957) have 
presented evidence that the repeti- 
tive discharges at the frequency of 
the CS were earlier in onset, higher 
in amplitude, and more stable in oc- 
currence in subcortical structures, 
particularly in the midbrain reticular 
formation, than in the occipital cor- 
tex. As conditioning proceeded, the 
cortex became progressively more 
synchronized with the reticular for- 
mation until the relationship be- 
tween them almost reached identity. 
The critical role of the midbrain 
reticular formation is also sup- 
ported by the results of Herndndez- 
Peén, Brust-Carmona, Eckhaus, 
Lopez- Mendoza, and Alcocer-Cuaron 
(1956), who established a conditioned 
salivation to visual and tactile stimuli, 
and then made restricted lesions in a 
number of subcortical structures, in- 


cluding the midbrain reticular forma- 
tion, septal area, medial thalamic 
nuclei, superior colliculi, etc. Only 
the lesions in the midbrain reticular 
formation, which never resulted in 
coma, abolished or reduced the con- 
ditioned salivary response in the 
awake animal. Since unconditioned 
salivation was unaffected or even en- 
hanced after lesion, the authors con- 
cluded that ‘learning seems to re- 
quire the functional integrity of the 
brain stem reticular system.” 
Habituation studies on both the 
arousal response and the _ specific 
evoked potential have also proved 
fruitful in analyzing the role of the 
reticular system in learning. Record- 
ing from naturally sleeping, unan- 
esthetized cats with permanently im- 
planted electrodes in cortical and 
subcortical structures, Sharpless and 
Jasper (1956) found that habituation 
of the arousal response to simple 
tones occurred rapidly in intact ani- 
mals. The habituation was fre- 
quency-specific, although it also 
showed some degree of generaliza- 
tion. Many of the intact animals also 
gave an habituation response which 
was specific to the particular pattern 
of tone utilized, although changes in 
pattern were less effective in produc- 
ing arousal after adaptation than 
changes in simple tones. Selective 
lesions within the specific sensory 
paths produced differential effects 
upon adaptation. Removal of the 
cortex abolished pattern-specific ha- 
bituation; while transection below 
the geniculate bodies destroyed both 
pattern and frequency-specific ha- 
bituation. The habituation of the 
arousal response was found to be in- 
dependent of changes in the primary 
sensory pathways, since the specific 
evoked potentials could still be ob- 
tained from the cortical projection 
areas after the stimulus had lost its 
power to elicit generalized activation. 
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Sharpless and Jasper concluded that 
habituation of arousal occurs within 
“the brain stem reticular and the un- 
specific thalamic systems with their 
associated collateral pathways.” 
Whether habituation is an autono- 
mously initiated activity of the retic- 
ular system in response to informa- 
tion transmitted directly by the 
classical sensory paths, or whether it 
is controlled by higher centers is not 
known at this time. The authors did 
present evidence that adaptation re- 
mained frequency-specific despite 
cortical removal for cats with collat- 
erals from intact medial geniculate 
bodies into the diffuse thalamic nu- 
clei. In contrast, adaptation was spe- 
cific only to intensity and sensory 
mode for those animals who, due to 
transection of the specific sensory 
pathways above the colliculi, pre- 
sumably retained functional collat- 
erals solely to the brain stem reticu- 
lar formation. Whether this differ- 
ence in specificity between the thala- 
mic and brain stem reticular systems 
represents a real distinction between 
the discriminative properties of the 
systems, or merely reflects variation 
in the amount of information trans- 
mitted by the specific sensory collat- 
erals at the two levels is unknown. 
Nevertheless, it is tempting to specu- 
late upon the possible implications of 
these results for the concept of gen- 
eralization. The Sharpless and Jasper 
data would seem to indicate the pos- 
sibility that two different neurologi- 
cal mechanisms are involved in the 
generalization of arousal at the non- 
specific level, with intensity gen- 
eralization dependent upon the ac- 
tivity of the brain stem reticular for- 
mation and quality discrimination a 
function of the thalamic reticular 
system. If this distinction is valid, 
then analysis of the functioning of 
these two systems may clarify the 
conditions under which generaliza- 


tion reflects a differentiated response 
to dimensional similarities, as dis- 
tinct from the occasions when it 
merely represents a failure of dis- 
crimination. If finer discriminations 
are indeed related to thalamic mech- 
anisms, then the prepotence of brain 
stem reticular arousal over thalamic 
activity under conditions of high ac- 
tivation would suggest a_ possible 
neural mechanism underlying the 
failures of discrimination which occur 
under conditions of high drive. 

The studies cited thus far have 
been concerned with modification of 
activity within the reticular system, 
either directly or as indexed by the 
arousal response. There is also evi- 
dence of changes of function within 
the specific pathways as a result of 
reticular control. Galambos, Sheatz, 
and Vernier (1956) presented con- 
tinuous auditory clicks to cats over 
extended periods of time while re- 
cording from the auditory and visual 
cortex, cochlear nucleus, hippocam- 
pus, septal area, and amygdala. After 
the animals had been subjected to 
the stimuli for hours or days, evoked 
potentials at all loci either disap- 
peared or else were small and irregu- 
lar in nature. Concomitantly, there 
was a lack of consistent behavior to- 
ward the stimuli. After 10 to 20 
strong shocks had been paired with 
the clicks, recordings from the coch- 
lear nucleus, as well as from the 
other sites, showed augmentation of 
the evoked potential. Behavioral 
changes, such as crouching, alertness, 
snarling, etc., appeared concomit- 
antly. During the extinction process, 
both behavioral and electrophysio- 
logical responses disappeared, with 
motor responses extinguishing prior 
to the evoked potentials. Thus, be- 
havioral conditioning and extinction 
would appear to be accompanied 
by consistent electrophysiological 
changes in the specific sensory pro- 
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jections, the reticular activating sys- 
tems, and the rhinencephalic struc- 
tures. That these changes within the 
specific pathways were a consequence 
of reticular activity would seem a log- 
ical conclusion on the basis of the evi- 
dence that electrical stimulation of 
the midbrain reticular system led to 
depression of the auditory evoked 
potential in the cochlear nucleus. 

Both photic habituation (Hernan- 
dez-Peén, Guzman-Flores, Alcaraz, 
& Fernandez-Guardiola: 1956, 1957) 
and acoustic habituation (Hernan- 
dez-Peén, et al., 1957; Herndndez- 
Peén & Sherrer, 1955; Herndndez- 
Peén, Sherrer, & Jouvet, 1956) have 
been reported. The auditory habitu- 
ation occurred to sounds of constant 
intensity, repeated thousands of 
times. Habituation continued when 
the animal was asleep and was selec- 
tive to the particular frequency of 
sound used. Once habituation had 
been established, recovery in the 
magnitude of the evoked potential 
was found to occur under the follow- 
ing conditions: (a) after a period of 
rest following discontinuation of the 
habituation stimulus; (6) after sudden 
and intense acoustic stimuli; (c) after 
pairing with electrical shock; (d) 
after lesions of the midbrain reticular 
formation; and (e) under pentobarbi- 
tal anesthesia, which depresses the 
activity of the reticular system. The 
release of habituation under anes- 
thesia and following brain stem le- 
sions indicated that the reticular sys- 
tem was critically involved in habitu- 
ation, but whether it functioned 
autonomously or merely as a way sta- 
tion for cortical control remained un- 
determined. 

In summary, then, the studies re- 
viewed in this section confirm the 
critical role of the reticular structures 
in the learning process, but fail to 
clarify the extent of their functional 


autonomy. Unfortunately, analysis 
of the relative contributions of the 
specific sensory structures and the 
reticular system to adaptation and 
generalization involves many experi- 
mental difficulties. The. resolution 
of these problems will go far toward 
clarifying the nature of processes 
which are fundamental t» learning. 


CONCLUSION 


Studies of the reticular formation 
indicate that its structural complex- 
ity and functional plasticity override 
the conceptual limitations inherent 
in more static, reflex-like neural 
mechanisms. These characteristics 
permit it to exert facilitatory and 
suppressive effects which have a 
time-course of seconds and even min- 
utes on the activity of central nerv- 
ous system structures. This span is 
comparable to that of many behav- 
ioral events. 

The neurophysiological distinction 
between ‘‘specific’’ and ‘‘nonspecific”’ 
systems is particularly relevant to 
psychological theory. Constructs 
such as attention, perception, moti- 
vation, drive, reward, and punish- 
ment possess a common factor of non- 
specific reticular activation in addi- 
tion to their specific properties. This 
general factor of nonspecific activity 
has effects which are lawfully related 
to the timing and intensity of its ap- 
plication. It is essential, therefore, 
that psychological constructs be criti- 
cally evaluated in an attempt to de- 
termine the extent to which they are 
a function of “‘nonspecific’’ as well as 
‘specific’ factors. Conceptual reas- 
sessment may well indicate that cate- 
gories now regarded as independent 
and mutually exclusive in terms of 
operational criteria are functionally 
interrelated on the basis of a common 
factor of reticular activation. 
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Whenever an experiment involves 
collecting data from more than two 
groups or under more than two condi- 
tions we become involved in the prob- 
lem of multiple comparisons—the 
problem of comparing each group 
with every other group or arranging 
the results in rank order. This be- 
comes a problem when we wish to 
assign a level of confidence or signifi- 
cance to our conclusions about the 
relationships among all of the popu- 
lations involved. Classical methods 
such as the F test permit us only to 
reject the over-all null hypothesis 
that all of the means are equal but 
they do not provide a procedure for 
comparing specific means with each 
other. 

In the older psychological litera- 
ture, this problem has been dealt 
with in a haphazard manner, without 
recognizing the issues involved. More 
recently, statistical procedures spe- 
cifically designed for multiple com- 
parisons have become available and 
have been discussed briefly in the 
psychological journals (McHugh & 
Ellis, 1955; Stanley, 1957). It has 
not been clear to many psycholo- 
gists, however, that there are several 
different methods with different basic 
assumptions or epproaches. There 
are important questions of logic in- 
volved in the use of these methods 
and these issues have not been clearly 
faced in the psychological literature. 
This is partly because many of the 
papers by statisticians on this sub- 


1 The writer wishes to express his appreci- 
ation to Urie Bronfenbrenner and W. T. Fe- 
derer for their detailed comments and sug- 
gestions upon an earlier draft of this paper. 
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ject are in sources which are inacces- 
sible or rarely used by psychologists. 
In particular, one of the most exten- 
sive discussions of the logical prob- 
lems of multiple comparisons, that of 
J. W. Tukey, has been available only 
in a privately circulated paper.’ 
Other aspects of the problem have 
not been dealt with at all, to this 
writer’s knowledge, so that it seems 
to be time for an attempt to survey 
the problem systematically. 

The emphasis here is upon ques- 
tions of logic rather than specific 
methods of computation. For the 
latter, we shall simply refer to appro- 
priate sources, after we have tried to 
make clear the implications of choos- 
ing to use a particular method or set 
of tables. 

Multiple comparisons and_ other 
multiple tests. Multiple comparisons 
are only one instance of the use of 
multiple statistical tests in a single 
piece of research. We shall not have 
space to deal explicitly with the other 
tests except as we need to distinguish 
them from the problem of multiple 
comparisons. One of the sources of 
confusion in the past has been the 
failure to distinguish one kind of 
multiple testing from another. 

In order to prevent this kind of 
confusion from the outset, we may 
list at least five main cases in which 
statistical 


multiple tests are em- 
ployed: 
1. Multiple comparisons. This 


covers all cases in which results in 
several different groups are to be 


2]. W. Tukey, The Problem of Multiple 
Comparisons. Privately circulated mono- 
graph, 1953. 
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compared. Any statistic may be in- 
volved—mean, median, frequency, 
correlation coefficient, etc. For ex- 
ample, we may wish to compare the 
correlations between intelligence and 
school grade in Schools A, B, C, D, 
E, etc. The methods which have been 
explicitly published and taken up by 
psychologists have all been con- 
cerned, however, with comparisons 
in terms of the means of groups. 
Some methods for other statistics, 
e.g., proportions, are now beginning 
to appear. 

2. Multiple tests with tntercorre- 
lated variables. The most common in- 
stance of this case is the computation 
of a number of different correlation 
coefficients with a single batch of Ss. 
If 10 tests are given there may be 45 
intercorrelations and the researcher 
may wish to state which of these cor- 
relations are significant. 

3. Multiple variables in analysis of 
variance. A factorial design will per- 
mit the computation of several F 
ratios for the data of an experiment. 
These F ratios may not be inde- 
pendent if a common error estimate 
is used for several of them. Whether 
independent or not, several tests are 
made in the same experiment, and the 
implications of this fact need to be 
analyzed. Similar problems arise if 
other kinds of analysis are used for 
what is essentially a factorial design. 
For example, several nonparametric 
tests may be made of different rear- 
rangements of the data in a way 
which is equivalent to analysis of the 
main effects in analysis of variance. 

4. Replicated tests of a single hy- 
pothesis. In the first three cases men- 
tioned above the statistical tests are 
concerned with different hypotheses. 
For example, the different F tests in 
a factorial design are concerned with 
different variables. This fourth head- 
ing is concerned with cases where the 


same experiment is repeated with dif- 
ferent groups of Ss and repeatedly 
tested for statistical significance. 

5. Overlapping measures relating to 
a single hypothesis. Several different 
ways of measuring the same underly- 
ing variable may be available—e.g., 
different measures of rate of learning 
—and a significance test is applied to 
each of the measures separately. 

The main purpose of the list is to 
emphasize that we are concerned 
only with the first of these headiugs. 
Space will not permit us to analyze 
the other cases, which must be left to 
later discussions. 


GENERAL ISSUES IN MULTIPLE 
COMPARISONS 


A priori vs. a posteriori compari- 
sons. It has been assumed that no 
modifications of classical 
methods are needed where the com- 
parisons to be made are specified in 
advance of the collection of data (a 
priori). Most of the recent literature 
on multiple comparisons has concen- 
trated upon methods for making 
comparisons suggested by the data 
(a posteriori, also called post-mortem 
comparisons). For example, suppose 
that five conditions of learning are 
being compared. In advance, the ex- 
perimenter predicts from his learning 
theory that Condition A will lead to 
most rapid learning, Condition B 
will be second, and so on. Fisher 
(1947), and others following him, 
have recommended that the experi- 
menter perform an over-all F test 
first, then, if this is significant, he 
may perform ordinary ¢ tests be- 
tween A and B, B and C, etc. It is 
pointed out, however, that this 
method would be incorrect if the 
comparisons to be tested had not 
been selected in advance (Fisher, 
1947; McHugh & Ellis, 1955; Stan- 
ley, 1957). The new methods have 


special 
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been designed for comparisons sug- 
gested by an inspection of the data. 

We shall contend that the differ- 
ences between the a priori and the a 
posteriori situation are slight, or even 
nonexistent, when everything is taken 
into account. This is to say that the 
newer methods are needed for all 
multiple comparisons, and that the 
classical methods are inappropriate 
even in a priori comparisons, except 
for very special circumstances. 

The issue here is similar to that in- 
volved in the debate over “one- 
tailed’”’ vs. ‘‘two-tailed”’ tests of sig- 
nificance for comparing two groups 
(Burke, 1953; Hick, 1952; Jones, 
1952; Marks, 1951). The one-tailed 
test is appropriate only if the direc- 
tion of difference is predicted in ad- 
vance, and if the experimenter is 
willing to overlook any difference in 
the opposite direction, no matter 
how large. Only two conclusions are 
possible from the data when a one- 
tailed test is used—either there is a 
difference in the predicted direction, 
or the results of the experiment are 
inconclusive; in effect, the experi- 
ment cannot obtain results which are 
considered as a significant refutation 
of the prediction. If the experi- 
menter allows for the possibility of a 
result that contradicts his hypothe- 
sis, he must use a two-tailed test, and 
there is no difference in method of 
analysis from that used in an empiri- 
cal experiment where no predictions 
are made in advance. 

In the case of more than two 
means, the number of possible con- 
clusions is increased. We may have 
not only confirmation or contradic- 
tion of the prediction, but we may 
also have varying degrees of partial 
agreement with the prediction. Since 
it is usually not specified in advance 
what will be considered as a partial 
confirmation of the prediction, the 


situation is reduced essentially to the 
a posteriori case. Only if the experi- 
menter states in advance all possible 
conclusions and the rules by which 
conclusions will be drawn, 
would he have an a priori test. 

Because of the multiplicity of con- 
clusions which might be drawn, it 
would appear most feasible to con- 
sider the statistical analysis as inde- 
pendent of any predictions of the ex- 
perimenter. In other words, we con- 
sider the statistical analysis as a 
method of making statements about 
the state of affairs as revealed by the 
data. If it turns out that the state of 
affairs is in complete or partial agree- 
ment with the prior prediction, the 
experiment makes the theory more 
plausible. If the results are wholly 
or partially in opposition to the pre- 
diction, then the theory needs to be 
revised. 

At this point the position must be 
stated very dogmatically. After some 
of the other problems have been 
dealt with, and a more complete 
terminology has been developed, we 
shall be able to give these conclusions 
further support. 

The concept of the error rate. The 
notions of significance level or confi- 
dence level have been useful ideas so 
long as we were dealing with a single 
difference between one pair of means, 
a single F ratio, a single chi-square 
value, and so on. The use of these 
terms becomes confused, however, 
when we are making simultaneous 
statements about a number of differ- 
ent comparisons of means, several 
different F ratios in a single experi- 
ment, or the like. The confusion is 
due to the fact that the concept of 
significance level may be extended in 
several different directions when we 
are considering multiple comparisons 
or multiple tests. We owe much to 


’ 
tilese 


J. W. Tukey, who has clarified this 
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point, and who has developed the 
concept of error rate for multiple 


comparisons (see Footnote 2). 

There are several different kinds of 
error rate involved in the multiple 
comparison problem (and in other 
situations involving multiple tests). 
Some methods of making multiple 
significance tests fix one of the error 
rates at a suitably low level, but may 
allow the other error rates to become 
absurdly large. The problem be- 
comes that of deciding which error 
rates should be kept under control, or 
what compromises may be effected. 

Three of the main kinds of error 
rate are: 

1. Error rate per comparison. This 
is the probability that any particular 
one of the comparisons will be incor- 
rectly considered to be significant. 
In general this approach is discour- 
aged by statisticians for reasons ex- 
plained below. 

2. Error rate per experiment This 
is the long-run average number of er- 
roneous statements per experiment. 
In statistical jargon it is the expected 
number of errors per experiment. Un- 
like the first rate, which is a 
probability, the error rate per experi- 


error 


rukey’s terminology is based upon “‘fami- 
lies’’ of comparisons rather than upon the ex- 
In the one-dimensional case, these 
Phat is, the 
of each mean with each other in the experi- 


periment. 
ire equivalent terms. ‘ comparison 
If we are 
two-variable analysis, 


ment is a “family’’ of comparisons 
with 
ever, the experiment may be 


concerned how- 
broke n down 
into two families of comparisons, one for each 
variable. We could therefore specify an error 
rate per family and a rate tamilywise as well 
as per experiment and experimentwise. Our 
discussion will be based primarily upon the 
one dimensional problem, and it seemed that 
the issues would be clearer if we emphasized 
Even where there 
we shall 
argue that the experiment should be the basis 
of analysis of the error rates. An¢ 


the experiment as a unit 
are several families of comparisons, 


ther discus- 
sion of experiment-based error rates is found 
in H, O. Hartley (1955). 


ment could be greater than one. That 
is, we could set a criterion of “sig- 
nificance” in such a way that we 
would average three false statements 
for each experiment. 

3. Error rate experimentwise. This 
is the probability that one or more 
erroneous conclusions will be drawn 
in a given experiment. In other 
words, experiments are divided into 
two classes: (a) those in which all 
conclusions are correct, and (b) those 
in which some conclusions are incor- 
rect. The error rate experimentwise 
is the probability that a given experi- 
ment belongs in class (0b). 

It may help to understand the dis- 
tinctions among these error rates if 
we think of a long series of experi- 
ments carried out in a given field, 
all with the same experimental de- 
sign. In each experiment a certain 
number of statements of significance 
is made—e.g., ‘‘Method A is signifi- 
cantly better than Method B”; 
“Method C is significantly poorer 
Method B.”’ To be concrete, 
suppose that there were 1000 experi- 
ments, each with 10 statements of 
significance, 10,000 statements in all. 
Of these statements, 90 are actually 


than 


false, and these false statements are 
distributed among 70 of the experi- 
ments. The different error rates are 
then as follows: 


1. Error rate 
90/10,000 or .009 

2. Error rate 
90/1000 or .09 

3. Error rate 
70/1000 or .07 


per comparison: 


per experiment: 


experimentwise: 


If we look only at the error rate per 
comparison we would say that the 
statements of significance were made 
at better than the ‘.01 level.”’ Yet 
the probability is greater than .05 
that any given experimental report 
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will contain one or more false claims 
of significance. 

The various error rates are all the 
same in a simple experiment with a 
single comparison, but they become 
more and more divergent as the num- 
ber of comparisons per experiment in- 
creases. Thus, if each of 10 means is 
compared with each of the others 
there are 45 comparisons in one ex- 
periment. If the ‘significance level”’ 
(Error Rate 1 above) of the test ap- 
plied to each comparison is .01, we 
should expect .45 erroneous conclu- 
sions per experiment. The probability 
that there will be one or more incor- 
rect conclusions in a given experi- 
ment (Sense 3) will be somewhere be- 
tween these two values, usually closer 
to .45, as will be explained below. 

Which of the three values is, then, 
the “significance level’’ to be at- 
tached to the conclusion from this ex- 
periment? This is a point for exten- 
sive analysis, but we shall need more 
concepts before we can do it justice. 
At this point we shall say only that 
the basis for the choice between these 
three rates is still incompletely an- 
alyzed. Statistical workers have rec- 
ognized the problem and have de- 
veloped their procedures for multiple 
comparisons primarily on the basis of 
the third rate of error—the proba- 
bility that one or more erroneous con- 
clusions will be made in a given ex- 
periment, the experimentwise error 
rate. The implications of this deci- 
sion have not, however, been exten- 
sively developed, at least to the pres- 
ent writer’s knowledge. 

Multiple null-hypotheses. The con- 
cept of error rate cannot be defined 
completely without taking account of 
another important fact. In our dis- 
tinctions between error rates per 


comparison, per experiment, and ex- 
perimentwise, the reader may have 
hypothesis 


inferred that the null 
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would be that all means are drawn 
from a single population—the same 
null hypothesis which is tested in anal- 
ysis of variance by means of the F 
test. We shall call this the ‘‘com- 
plete’ null hypothesis. This is one 
possibility which must be consid- 
ered, but it is not by any means the 
only one. In our example of 10 
means, five might be drawn from one 
population and five from another, six 
from one and four from the other, 
two from each of five different popu- 
lations, and so on. For each of these 
different null hypotheses, there is an 
error rate per comparison, per experi- 
ment, and experimentwise, for any 
given method of testing differences. 
The question is, therefore, which of 
these null hypotheses is used to de- 
fine the error rate for our statistical 
test? 

Tukey’s answer (see Footnote 2) 
to the above question is to define the 
error rate as the maximum value it at- 
tains under all possible null-hy- 
potheses. Some of the currently pro- 
posed methods for multiple compari- 
son are based solely upon the com- 
plete null hypothesis as the standard, 
even though the error rate may be 
higher with some other null hypothe- 
sis. Tukey’s decision would seem the 
most reasonable as well as the most 
cautious approach to this aspect of 
the problem. 

To show how the error rate may be 
higher for some partial null hypothe- 
sis than it is for the complete hy- 
pothesis, let us consider a _ specific 
method of testing multiple differ- 
ences based upon traditional ap- 
proaches. Ten groups are being com- 
pared, and we test first with an over- 
all F test at the .01 level. Then if the 
F test shows significance, we will 
test each difference with an ordinary 
t test at the ‘‘.01 level.’”’ The experi- 
mentwise error rate is .01 if we con- 
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sider only the complete null hy- 
pothesis, since no further compari- 
sons will be made if the F test does 
not show significance. The F test is 
specifically designed to produce this 
error rate under the complete null 
hypothesis. 

Suppose, however, that there are 
actually five populations, with two 
groups drawn from each population, 
and suppose that these populations 
are widely separated. Then it is al- 
most certain that the F test will be 
significant, and ¢ tests between pairs 
drawn from distinct populations will 
also be almost certainly significant, 
as they should be. We can still make 
errors, however, in comparing means 
in the pairs drawn from identical 
populations. Since there are five such 
comparisons, the probability that 
one or more of these will be incor- 
rectly judged to be significant is 
(1 —.995) which is approximately .05. 
Thus the error rate experimentwise 
is .05 instead of .01 for this particular 
null hypothesis. The 


more means 


there are to be compared by this 
method, the higher will the experi- 


mentwise error rate become, even 
though the error rate based upon the 
complete null hypothesis is fixed at 
.01 for any number of means. 

Error rates and a priori compari- 
sons. Now that we have looked at 
some of the different ways of evaluat- 
ing error rates, we can deal more con- 
cisely with the problem of a priori 
vs. a posteriori comparisons. As an 
example, consider a learning experi- 
ment in which five conditions are be- 
ing compared, and suppose that the 
experimenter has predicted in ad- 
vance the complete order in which the 
means should appear. He has, in ef- 
fect, predicted significant differences 
for all possible comparisons of the 
five means, and complete agreement 
with the theory should produce 10 


significant differences. Suppose that 
he merely computes all 10 ¢ ratios in 
the standard way, determining their 
significance by references to the 
standard “Student’’ tables, and as- 
sume that he uses the .01 levels from 
these tables. The method which this 
experimenter has used has an error 
rate of .10 per experiment, and also 
experimentwise, even though all of 
the tests were computed on the basis 
of a .01 level for the individual com- 
parisons. In other words, in 10% of 
experiments analyzed by this method, 
there will be one or more “signifi- 
cant’’* differences, even though the 
complete null hypothesis is true. 

Compare this with the case where 
no predictions were made in advance. 
The experiment is performed to “‘see 
what happens” and, again, all possi- 
ble ¢t tests are computed. The error 
is exactly the same as it was when 
advance predictions were made, if 
we leave aside the question of ‘‘one- 
tailed”’ vs. ‘‘two-tailed’”’ tests. (If 
the experimenter in the a priori case 
wishes to allow for contradictions to 
his theory which could come out to 
be “‘significant’’ he must use a two- 
tailed test, just like the experimenter 
who makes his comparisons after the 
results are in—a posteriori.) 

In other words, the essential factor 
is the number of comparisons to be 
made and the error rate to be used, 
rather than the question of a priori 
vs. a posteriori comparisons. When 
ordinary ¢ tests are applied to all 
comparisons, each of the different 
kinds of error rate is the same 
whether predictions were made in ad- 
vance or not. 

The only situation in which ad- 
vance predictions make a difference 


‘In this discussion “significant’’ in quotes 
refers to a difference which would be judged 
to be significant in using the classical tables 
and based on single comparisons, 
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would be that in which several 
groups are studied, but only certain 
pairs are to be singled out for signifi- 
cance tests. Suppose that in the a 
priori case, one pair is specified in ad- 
vance as the only comparison of in- 
terest, while in the a posteriori case, 
only the largest difference is to be 
studied. The probability that the 
largest of 10 comparisons will be sig- 
nificant is not the same, of course, as 
the probability that a pair chosen at 
random will be significant. The null 
hypothesis is that the pair chosen in 
advance by the theory might as well 
have been chosen at random. Thusa 
t test applied in the usual way at the 
.01 level has a probability of .01 of 
being significant, in the a priori case. 
When the largest of the 10 differ- 
ences is chosen, it has a probability 
of .10 of appearing to be significant at 
the .01 level by classical two-mean 
tests. The probability that the larg- 
est difference will be “‘significant”’ is 
the same as the probability that there 
will be one or more “significant” 
pairs among the 10 comparisons. In 
other words, the experimentwise error 
rate for all comparisons applies to 
this special case. 

The above example is helpful in 
seeing the issues involved in multiple 
comparisons, but it has little practi- 
cal application. Tukey suggests that 
it might occur when all but two of the 
groups were studied as “camouflage” 
and only the particular two are of 
interest to the experimenter. Usu- 
ally, however, an experimenter who 
is testing a theory will use five groups 
for one of two reasons: (1) all are in- 
terrelated in the theoretical predic- 
tions or (2) some of the groups are 
predicted from theory while others 
are unpredictable from the stand- 
point of theory but the experimenter 
wishes to find out how they compare 
with each other and with those which 


are predicted by the theory. We 
have already shown that the first 
case is no different from the com- 
pletely empirical exploratory study. 
The second case would be different 
only if the results were considered as 
belonging to two separate and unre- 
lated experiments—one group of 
comparisons being used to test the 
theory, the other comparisons being 
considered as part of another empiri- 
cal exploration. 

In all of these examples, we have 
assumed that the experimenter who 
is making a priori comparisons will 
consider each “significant’”’ difference 
in the predicted direction as support- 
ing his theory, and each “‘significant”’ 
difference in the opposite direction 
as a contradiction to his theory. He 
could, of course, have specified other 
rules for interpreting the results. In 
actual practice of psychological re- 
search, however, he rarely does, and 
the usual situation is that no rules at 
all are specified in advance. The de- 
cision as to what constitutes ‘‘agree- 
ment,” “‘partial agreement,’”’ and so 
on, is made only after the results are 
in and the significance tests are made. 
This is another strong reason, already 
mentioned in the preliminary discus- 
sion of this problem, for treating all 
cases of multiple comparison in the 
same manner, whether there are pre- 
dictions in advance or not. 

Nevertheless, we should investi- 
gate to see if carefully specified rules 
for interpreting the results in rela- 
tion to the theory would have any ef- 
fect upon the significance tests. Sup- 
pose, for example, that the experi- 
menter says, “If there are at least 
some significant differences in the 
predicted direction, and none in the 
opposite direction, I shall consider 
the theory as partially substantiated. 
If there are any differences which ap- 
pear to be significant reversals of pre- 
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diction I shall revise or abandon the 
theory.”’ The error rate must now in- 
clude errors of false acceptance of the 
theory and also errors of false rejec- 
tion. 

Here it is easy to show that there 
are circumstances in which false ac- 
ceptance of the theory is almost cer- 
tain. For example, suppose that 
Populations A and B are equal and 
substantially higher in mean than 
Populations C and D, the latter pair 
also being equal. Finally, Population 
E has a considerably lower mean than 
any other group. The psychological 
theory has predicted that all groups 
are different, with mean A the high- 
est, B next, and so on down to E. In 
other words we are assuming that 
the actual state of affairs is in partial 
agreement with the theoretical pre- 
dictions, but that the theory is 
wrong in the relation of A to B and 
of C to D. We suppose in addition 
that the differences which do exist 
are so large that significant differ- 
ences are almost certain in those par- 
ticular comparisons. In this situation 
the theory will be rejected only if A 
is found to be significantly lower than 
B, or C lower than D. In all other 
cases the theory will be considered as 
supported by the experimental re- 
sults. If ¢ tests are made at the .01 
level there is only a .005 probability 
that Groups A and B will be found 
in significant contradiction to the 
theory, and the same value applies 
to the C-D pair. The probability that 
one or both will be reversed is ap- 
proximately .01. Therefore the ex- 
perimenter has a .99 probability of 
finding support for his theory and 
only a .01 probability of contradict- 
ing it. 

At this point the reader may ob- 
ject that accepting the theory under 
these circumstances should not be 
considered as entirely erroneous. Aft- 


er all, the actual state of affairs in 
the populations is at least partially in 
agreement with the prediction from 
theory. Certainly, to accept the 
theory in this case would not be so 
bad as to accept the theory when the 
actual population values are a com- 
plete reversal of the predicted levels. 
The question then becomes: How do 
we evaluate different degrees of 
agreement between the actual state 
of affairs and the theoretical predic- 
tions? Clearly this cannot be done 
on the basis of probability, nor can it 
be built into a standard significance 
test. The seriousness of disagree- 
ment depends upon the structure of 
the theory and the nature of the 
groups being compared. For some 
theories, the fact that Populations A 
and B are equal could be a very cru- 
cial defect in the theory; in other 
cases, this might be only a minor 
point, easily rectified. If the rela- 
tive importance of all of the possible 
comparisons were stated in advance, 
with some kind of numerical weights, 
it would be possible, although very 
complicated, to compute probabilities 
for each outcome and also some kind 
of a weighted risk function. This 
would differ from experiment to ex- 
periment and would probably be of 
little practical value. 

To summarize, it is argued that 
comparisons decided upon a _ priori 
from psychological theory 
should not affect the nature of the 
significance tests employed for multi- 
ple comparisons. Our reasons may be 
recapitulated as follows: 

1. Ordinarily, no statement is made 
in advance as to what will be consid- 
ered substantial agreement, partial 
agreement, partial contradiction, or 
complete disagreement with the the- 
ory. Even if such a statement were 
made, the probabilities of each of 
these conclusions being drawn incor- 


some 
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rectly would have to be included in 
the error rate. 

2. A theory which predicts the 
complete order of the results calls for 
just as many comparisons as the em- 
pirical experiment in which no pre- 
diction is made. Since the number of 
comparisons to be made is a crucial 
factor in the error rate, there is no 
difference in this respect between a 
priori and a posteriori comparisons. 

3. Some comparisons may be more 
important to a theory than others. It 
is not feasible, however, to take ac- 
count of this fact in devising signifi- 
cance tests or methods of setting con- 
fidence limits, since the relative 
weights would differ from experiment 
to experiment and would have to be 
specified quantitatively in advance. 
It is therefore more practical to ex- 
amine the results in a common-sense 
manner and to evaluate qualitatively 
the degree of support or contradic- 
tion offered by the data. 

Nonindependence of comparisons. 
In the textbooks, the student is some- 
times warned against a_ posteriori 
comparisons, because the different 
comparisons are not independent of 
each other. While lack of independ- 
ence is a factor to be taken into ac- 
count, it is not at all the main prob- 
lem. In fact, the error rates per com- 
parison and per experiment are com- 
pletely unaffected by independence 
or lack of it. The only important 
factor in these rates is the number of 
comparisons to be made. Only the 
experimentwise error rate is affected 
by independence. If all of the com- 
parisons are perfectly positively cor- 
related, all are significant or nonsig- 
nificant en bloc. Then the experi- 
mentwise rate is equal to the rate per 
comparison. In the case of complete 
independence, of negatively corre- 
lated comparisons, or even of mod- 
erate positive correlation, the experi- 


mentwise rate is nearly equal to the 
rate per experiment, when the latter 
is small. Most cases of multiple com- 
parison fall into the latter category, 
so that the dependence of the com- 
parisons has but a slight effect. 

In the multiple comparison prob- 
lem, the lack of independence in- 
volves the fact that each mean is 
compared with every other mean, 
and therefore appears in a number 
of different significance tests. In 
many cases also a single error esti- 
mate is used for all comparisons. The 
significance -tests are therefore not 
independent of each other, but this 
turns out to be less important than 
was once believed. Another kind of 
dependency must also be considered. 
The samples used in determining the 
various means may also not be inde- 
pendent of each other, notably in the 
case where the same Ss are used for 
each experimental condition. Such 
dependencies are easily taken care of 
by using two-way analysis of vari- 
ance with Ss considered as a second 
variable. 

The above Conclusions on the rela- 
tive unimportance of the factor of 
independence in multiple compari- 
sons do not necessarily apply to other 
cases of multiple significance tests. 
The other cases listed in the introduc- 
tion involve other kinds of depend- 
ency and must be analyzed sep- 
arately. 


THE CHOICE OF ERROR RATES 


In making multiple comparisons, 
then, neither specifying the tests in 
advance nor trying to arrange for in- 
dependent tests are of much im- 
portance, since they have little prac- 
tical effect upon any of the error 
rates. It is of much greater practical 
importance to consider which of the 
error rates is the best representation 
of the dependability or “‘significance”’ 

















MULTIPLE COMPARISONS 35 


of our results. We may work at the 
.01 level on a per comparison basis, 
yet the probability may be almost 
1.00 that we have made some errone- 
ous statements of significance in a 
given experimental report. In our 
current psychological literature the 
various bases for error rates are con- 
fused and sometimes used _inter- 
changeably. 

The problem we must consider is 
the implication of using a particular 
measure of error rate for clarity and 
consistency of treatment of our re- 
search results. The issue can be made 
concrete in an example: One experi- 
menter performs a series of four ex- 
periments. In the first experiment 
he compares Groups A and B, in the 
second, Groups B and C, etc. Each 
experiment is published separately 
with a ¢ test applied to the difference 
of means in each case. In each paper 
he summarizes the results obtained 
before and in the final paper he com- 
pares all five groups, still using simple 
t tests. A second experimenter, not 
so anxious for rapid and numerous 
publications waits until all the re- 
sults are in on all five groups, and 
performs an analysis of variance on 
all groups, considering the results as 
significant only if the F value is be- 
yond the .01 point. Both of these 
kinds of report are quite typical of 
the psychological literature. The 
second experimenter has used an ex- 
perimentwise rate of .01, at least for 
the complete null hypothesis, but he 
does not yet have any method of 
making specific comparisons between 
groups. The first experimenter has 
used a .01 level per comparison, but 
his experimentwise rate for the whole 
series of connected comparisons may 
be as high as .10, depending on how 
many of the possible comparisons 
among the five means are actually 
made. (To simplify matters, we as- 


sume that the first experimenter 
would have performed all four ex- 
periments regardless of the results. 
If he waited for the results of each 
experiment before deciding whether 
to continue the series, matters would 
be further complicated.) 

These examples should make clear 
that both the per comparison and the 
experimentwise rates are actually in 
use in typical researches now in the 
psychological literature, even when 
only classical techniques are used. 
The second example is now the more 
common approach, and even the first 
experimenter would probably be 
more likely to perform an analysis of 
variance in his last paper to sum- 
marize the over-all results. Whether 
he would be willing to retract earlier 
conclusions if the final analysis did 
not prove to be significant is, of 
course, an embarrassing question. 

The current widespread use of anal- 
ysis of variance would suggest 
adopting the experimentwise error 
rate as the standard practice. Cur- 
rent practice is not, however, suffi- 
cient justification unless it is based 
upon careful analysis. We must 
therefore examine the problem more 
fully. 

Since the rate per compécison is 
the easiest to use and requires no new 
methods at all, we may first consider 
the main argument in its favor. It 
might be contended that it makes no 
difference whether specific compari- 
sons are made one at a time by dif- 
ferent experimenters, or in groups by 
a single experimenter. The same 
amount of data is added to the pub- 
lished literature in either case. There- 
fore if the simple ¢ test is justified 
in one case it should be justified in 
the other also. As Tukey (see Foot- 
note 2) states this argument (which 
he considers fallacious), the man who 
has studied several means at once has 
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done more work and should be en- 
titled to make more erroneous con- 
clusions. 

There is, however, one very strong 
reason why an experimenter who 
studies a number of different groups 
or conditions has less justification 
for using a per comparison rate than 
an experimenter who performs a sin- 
gle experiment with two groups. 
Even if the complete null hypothesis 
is true, and the experimenter is work- 
ing with a factor or group of factors 
completely irrelevant to the behavior 
he is studying, the more conditions 
or the more variations of experi- 
mental conditions he studies the 
more chance he has of finding some 
differences which would appear to be 
significant on a per comparison basis. 
Thus he can obtain more “signifi- 
cant” differences by working harder 
upon irrelevant variables. This is 
the reason why Tukey considers the 
point of view of allowing erroneous 
conclusions in proportion to the 
amount of work done as an untenable 
point of view. 

The notion of allowing more errors 
per experiment in proportion to the 
amount of work done in the experi- 
ment would lead to another practice 
which is contrary to present usage. 
It would mean that the significance 
level in the ordinary two-group ex- 
perimental design could be reduced 
as the number of cases is increased. 
In this situation we ordinarily main- 
tain the significance level constant, 
but we gain through increases in 
power as the number of observations 
is increased. In the case of multiple 
comparisons we do not gain in power 
in the specific comparisons as the 
number of groups increases, but there 
is a compensation in that more in- 
formation about more different rela- 
tionships is gained as the number of 
comparisons increases. 

There are even objections to the 





use of error rate per comparison in a 
certain type of “experiment” in 
which only two groups are com- 
pared. Consider the following situa- 
tion: an experimenter is convinced 
that a certain factor should produce a 
difference in learning rate. He tries 
it once and fails to get a significant 
difference. He is so sure that the ex- 
periment should have worked that 
he reconsiders his experimental tech- 
nique for possible errors. He decides 
that some actually irrelevant feature 
of the experiment is responsible, 
changes it, and tries again. Finally 
after many different revisions of the 
conditions, all actually irrelevant, he 
obtains a “‘significant”’ difference and 
publishes the result. We assume that, 
as an honest scientist, he will men- 
tion in his report that several other 
trials failed, but this will not usually 
affect his test of significance, and he 
will usually explain away the earlier, 
unsuccessful trials as due to errors in 
technique. Clearly, all of his data 
should be tested as a single experi- 
ment, otherwise obtaining a “signifi- 
cant” difference will depend only 
upon the experimenter’s stubborn- 
ness and patience, or upon the num- 
ber of his research assistants. 

Error rate vs. Type II error and 
power. Several psychologists to 
whom the above argument has been 
presented have raised objections to 
the experimentwise error rate on the 
ground that it leads to great loss of 
power. They point out that a ¢ ratio 
may have to be as high as 4 or 5 for 
20 degrees of freedom to be significant 
at the .01 level instead of 2.85 as it is 
when significance is measured in the 
classical way. If this happens, they 
say, we are obviously going to miss a 
lot of real differences which might 
turn out to be important. 

While it is perfectly- true that the 
bigger the difference which is re- 
quired for significance, the less power- 
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ful is the test® (other things being 
equal, of course), this fact is irrele- 
vant to the involved in the 
choice of error rates. If the experi- 
menter prefers, he can still use a ¢ 
value of 2.85 as his criterion of sig- 
nificance even if he reports his re- 
sults in terms of error rates experi- 
mentwise. The difference is that his 
results will be reported as significant 
at (say) the .50 level experimentwise 
instead of the .01 level on a per com- 
parison basis (which is usually not 
labelled as such). 

In other words, the issue is not 
more or less powerful tests, since the 
power can be adjusted to any de- 
sired level, but simply how we are 
going to evaluate the Type I error. 
It must be admitted, however, that 
the tables which are available for 
establishing error rates on an experi- 
mentwise basis tend to limit the ex- 
perimenter to a fixed value of the 
error rate (usually .05). This situa- 
tion can be changed, however, if there 
is good reason to increase the power 
of the tests. 

Duncan's compromise. Duncan 
(1951, 1955) has argued that there is 
not only-a loss of power in changing 
from the per comparison to the ex- 
perimentwise but that this 
loss of power becomes progressively 
greater as the number of comparisons 
increases. Since his method has been 


issue 


basis, 


used in several recent research papers 
in psychology, we shall examine his 
assumptions in detail. Thus if one 
experiment involves 5 means while 
another experiment involves 10 
means, and both are evaluated by 
holding the experimentwise 
fixed at .05, the experiment with the 
10 means is less powerful in the sense 
that each difference must be larger 
to be judged significant. 


error 


5 See Harter (1957) for evaluation of the 
power of several multiple comparison pro- 
cedures. 


According to Duncan, this state of 
affairs should be reversed. As more 
and more conditions are studied it is 
more and more likely that some real 
differences exist, and therefore the 
statistical tests should become more 
powerful as the number of compari- 
sons increases. This would be the 
case if we used the error rate per com- 
parison as our basis of establishing 
significance, but then the probability 
of Type I error reaches unreasonably 
high levels. As his compromise, Dun- 
can proposes to base statements of 
significance upon the rate of error 
per independent comparison or per 
degree of freedom. 

The argument for Duncan's 
method would be that when two dif- 
ferent experimenters each perform a 
simple comparison of two groups we 
allow them each a certain error rate, 
because they have performed two 
independent comparisons. It is pro- 
posed that this allowance be ex- 
tended to a single experiment in- 
volving several comparisons. If 10 
means are compared there are 45 
comparisons, but only 9 can be made 
if we are to keep them independent 
of each other. Table 1 indicates the 
relationship among the rates per com- 


TABLE 1 


ERROR RATES PER EXPERIMENT*® WHEN 
ERROR RATES PER COMPARISON AND 
ERROR RATES PER DEGREE OF 
FREEDOM ARE CONTROLLED 


. Error Rate 
Error Rate per 


No. of Cc . per Degree of 
omparison = - 
Means ms Freedom Fixed 
Fixed at .01 
at .01 
2 .O1 01 
3 .03 .02 
4 06 .03 
5 .10 .04 
10 45 .09 
20 1.90 .19 
50 12.25 .49 
® Based on the complete null hypothesis 
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parison, per degree of freedom and 
per experiment for multiple means. 

The arguments against the per 
comparison basis of testing also ap- 
ply, although not so powerfully, 
against Duncan’s compromise pro- 
cedure. The experimenter still can 
increase his chances of finding sig- 
nificant differences by multiplying 
the number of irrelevant conditions 
which he studies in a given experi- 
ment. The probability of erroneous 
conclusions does not increase as fast 
as the number of comparisons in- 
creases, but it still increases. 

While it is true that it becomes less 
and less likely that all populations 
have the same mean, as we increase 
the number of groups, it is not known 
to what extent these differences are 
relevant to the problem being studied. 
We may therefore merely be increas- 
ing our probability of detecting dif- 
ferences which are the random result 
of factors which are not under study 
in the experiment. In other words, we 
increase the risk of finding a hodge- 
podge of “significant differences’”’ 
which cannot be given a meaningful 
interpretation. 

Duncan’s approach is also con- 
tary to common practice, where the 
F test is applied at the same proba- 
bility level, regardless of the number 
of groups under study. To be sure, 
common practice in analysis of vari- 
ance could also be wrong and could 
be revised according to Duncan's 
point of view, but we need to have 
some stronger arguments for doing so. 

Since the degree of conservatism 
and the inversely related power of 
the test can be explicitly varied by 
choosing varying rates of error per 
experiment or experimentwise, de- 
pending upon the type of material 
being studied and the purposes for 
which conclusions are being drawn, 
it does not seem necessary to adopt a 


rigid compromise between the per 
comparison and the experiment-based 
rates. Thus Duncan’s special pro- 
cedure seems unnecessary and may 
confuse the issues for the user of sta- 
tistics. 

Rates per experiment vs. experi- 
mentwise rates. While we cannot say 
flatly that all significance tests or all 
confidence limits must be based upon 
the experiment as a unit, there are, as 
we have seen, strong reasons to make 
the experiment the normal reference 
unit at least. In any event, it should 
always be made clear in an experi- 
mental report which approach is be- 
ing used. If the rate per comparison 
is chosen, it should require special 
justification. 

Although the two experiment- 
based error rates are often numeri- 
cally almost equal, they do represent 
somewhat different points of view 
about the nature of the conclusions 
from an experiment. In one case we 
control the total number of erroneous 
statements made in each experiment 
(rate per experiment). In the other, 
we consider that any erroneous state- 
ment spoils the conclusions from that 
experiment. In other words, the ex- 
perimentwise rate is based on the as- 
sumption that it is just as bad to 
make one erroneous conclusion as it 
is to make six in the same experi- 
ment. 

If we have to make a choice be- 
tween these two approaches, it will 
depend upon rather subtle differ- 
ences in the manner in which the ex- 
perimental conclusions are to be 
used. If the total set of conclusions 
is considered as a pattern supporting 
some theoretical position in such a 
way that any erroneous statement 
would destroy the pattern, then the 
experimentwise basis is clearly the 
one to use. If each fact can be inter- 


preted independently of the other 
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findings of the experiment, the per 
experiment basis is more appropri- 
ate. 

In practice, our interpretation of 
experimental findings probably does 
not fall clearly at either of these ex- 
tremes. One erroneous statement 
probably will not completely destroy 
the value of the findings, but, on the 
other hand, each “fact’’ must be in- 
terpreted in some relation to the 
other results. We would therefore be 
in difficulties, if the choice between 
the two bases were crucial. 

In a great many of the common ex- 
perimental designs, the per experi- 
ment basis can be worked out from 
statistical tables of standard tests al- 
ready in existence although they 
must be more extensive than those 
given in the textbooks. Special tables 
must be developed for the experi- 
mentwise error rates, but a number 
of these are already available. The 
principal practical advantage of the 
per experiment rate lies, therefore, in 
those cases where special tables are 
not yet available for experimentwise 
rates. 

Where there is any discrepancy be- 
tween results computed in the two 
ways, the per experiment basis is 
more conservative than the experi- 
mentwise basis. We are therefore 
safe in using the rate per experiment 
when in doubt, or when the experi- 
mentwise rate cannot be calculated, 
in that the error rate per experiment 
is always larger than or equal to the 
experimentwise rate. 

An algebraic statement of the rela- 
tionships among the various error 
rates may help to show why some of 
our previous statements about them 
are true. Let: 


pin = probability of one erroneous 
statement in a single trial 
particular critical 


using a 


value in a certain test (for 
example if t is considered sig- 
nificant when it exceeds 2.75 
and the degrees of freedom 
for error are 30, pin =.01). 
This is error rate per com- 
parison. 

Pi/x = probability of exactly one er- 
roneous statement out of a 
total of k statements which 
are made. 

p2)x = probability of exactly two er- 
roneous statements out of k, 
etc. 

EP =error rate per experiment 

EW error rate experimentwise 

Then, by definition: 

EP=expected number ot errors 

per experiment 
= pi k+2pr2 wt+3ps ct Huts 
+kpx/x 
EW = pict Pat payet + + st Pesk 
It can also be shown that EP=kpi; 

Thus EP is always greater than 
EW, and the difference between them 
depends upon the probabilities of 
more than one erroneous statement. 
The very simple relationship between 
EP and ~,, shows why EP can be 
calculated with standard tables. Sup- 
for example, that we are 
comparing 10 means. There are then 
(10)(9)/2=45 comparisons to be 
made. If each difference were tested 
with an ordinary ¢ test at the .01 
level EP is 45 (.01) or .45. To reduce 
EP to the .01 level, we simply reduce 
pin to .01/45=.00022 and find the 
corresponding value of ¢. The ap- 
proximate value of the required ¢ can 
be obtained from Pearson and Hart- 
ley’s Table 9, ‘Probability Integral 
of the ¢-Distribution’”’ (Pearson & 
Hartley, 1954). It turns out to be 
about 4.1 in the case given in the ex- 
ample above, where there are 30 de- 
grees of freedom. Thus, changing 


pose, 
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the critical value of ¢ from 2.75 to 4.1 
changes our error rate from .01 per 
comparison to .01 per experiment. 
(It is assumed that all differences are 
to be tested against a common criti- 
cal value of ¢. Later we shall show 
that it is possible to obtain a more 
sensitive significance test by using 
variable ¢ ratios depending upon the 
observed order of the means.) 

By methods which we shall not 
discuss here, the EW rate can also 
be used to find the critical value of ¢. 
In the example we are considering, 
the EW rate turns out to require a 
value of 4.07 for ¢t. The slight differ- 
ence is partly due to the gaps in the 
table of ¢, so that the 4.1 is only ap- 
proximate. Table 2 gives some fur- 
ther examples of comparative criti- 
cal values of ¢t for experimentwise and 
per experiment rates of .01. 

Fortunately, as the above exam- 
ples show, it is usually not necessary 
to make the difficult decision be- 
tween rates per experiment and ex- 
perimentwise in terms of the logic of 
the experimental interpretation. In 
practice it becomes merely a matter 


TABLE 2 
CRITICAL VALUES OF ¢ FOR TESTING 
DIFFERENCES AMONG SEVERAL 
MEANS FOR ERROR RATEs OF .O1 


Critical Values of t 
For 
EP=.01 
(approxi- 
mate)* 


No. of 
Means 


df for 
error For 


EW =.01 


30 
60 
120 
30 : 
60 .0 
120 3.8 
20 60 not covered 
120 by tables 


® To avoid interpolation, values are taken to the 
next tenth above the critical value. 


RYAN 


of computational convenience. 

Significance tests vs. confidence 
ranges. Several cf the methods now 
available for multiple comparisons 
give us conclusions in the form of 
statements of significance—‘‘The dif- 
ference between A and B is signifi- 
cant, that between B and C is not 
significant, etc.’”” Others make all 
comparisons in terms of confidence 
ranges of the difference—''The dif- 
ference between A and B is from 2 to 
15, the difference between B and C 
is from —3 to 10, etc.” 

When there are only two means to 
be compared, significance statements 
can be rather easily translated into 
confidence ranges, and vice versa. In 
the case of multiple comparisons, 
however, the relationship is more 
complex. For example, the confi- 
dence range for the difference be- 
tween B and C above is from —3 to 
10, yet a method of testing for signifi- 
cance of differences, with the same 
error rate, might label the difference 
between B and C as significant. 

This discrepancy. is because the 
most sensitive or powerful signifi- 
cance tests apply a different criterion 
of significance to different pairs of 
means, depending on how far apart 
they are in the total group. Thus 
the two extreme means must be far- 
ther apart for significance than two 
which are next to each other. The 
methods of determining confidence 
ranges have developed a single ‘‘al- 
lowance”’ which is applied to each of 
the differences regardless of where 
the means are in the total group. 

Tukey has argued for the almost 
universal use of confidence ranges in- 
stead of significance statements, bas- 
ing his case primarily upon the point 
that confidence ranges contain more 
information and information which 
is more useful to future researchers 
than a statement of significance. 
Whether or not he is correct in this 
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contention, the fact remains that 
most of our familiar statistical tools 
(e.g., the F and chi-square tests) are 
significance tests. As a result most 
psychological researchers are more 
accustomed to thinking in terms of 
significance. It will therefore require 
a long period of readjustment if 


Tukey’s point of view is to prevail. 


Because it is more in keeping with 
current practice and ways of think- 
ing in psychology, most of this paper 
is couched in terms of statements of 
significance. It is important to re- 
alize, however, that there is more dif- 
ference between the two approaches 
when we are involved in multiple 
comparisons than there is in the 
simple case of two means. 

Comparisons and contrasts. All of 
the discussion so far has been directed 
at the comparison of one mean with 
another mean in the group. 
times, however, other problems arise. 
We might, for example, wish to di- 
vide the means into two groups of 
means, by inspection of the data, and 
to state whether the two groups of 
means differ significantly from each 
other. In the literature of this field, 
the term contrast is used for the com- 
parison of any combination of means 
with another combination. Contrasts 
include cases where the are 
combined with differential weights 
for different groups. 


Some- 


means 


Some of the procedures now avail- 
able make it possible to test for sig- 
nificance or place confidence limits 
on all possible contrasts among the 
means of a given experiment. Meth- 
which effective for such 
broad purposes are not so effective, 


ods are 


however, for the case of simple com- 
parisons of one mean with another. 
SPECIFIC METHODS FOR MULTI- 
PLE COMPARISON OF MEANS 
We shall briefly 


some procedures which have been 


describe very 


proposed for solving the problem 
of mw iltiple comparisons. Tukey’s 
met!.ods will be stressed especially 
because of convenience, because of 
their control of experimentwise error 
rates for all null hypotheses, and be- 
cause of their special suitability for 
simple comparisons of means. 

Several of these procedures involve 
something which we may call a “‘lay- 
er method.’ By this we mean that 
the observed means (or other meas- 
ures) are first ranked from low to 
high. A first test is applied to the dif- 
ference of the extremes. If this dif- 
ference is not considered significant 
by the rules of the method, no fur- 
ther tests are made. If the extremes 
are significantly different an extreme 
value is tested against the value next 
to the other extreme, and so on. The 
effect is that no differences within a 
group or any subgroup of means can 
be considered significant if the ex- 
tremes of the subgroup are not sig- 
nificantly different. The size of 
difference required for significance 
changes with the separation of the 
means in the rank-ordered array. 
The layer method would be con- 
trasted with a method by which a 
fixed interval is required for signifi- 
cance of any difference, no matter 
where the means fall in the rank or- 
der of the results. It would also be 
contrasted with a “gap’’ method in 
which adjacent means are tested at 
once. If a gap is significant then all 
means on one side of the gap are sig- 
nificantly different from those on the 
other side. Tukey’s earlier method 
(1949) was based upon gaps, but he 
has since abandoned the procedure 
as unsatisfactory. 

The method of Newman and Keuls. 
This procedure, first suggested by 
Newman (1939) and refined by 
Keuls (1952), controls the error rate 
experimentwise only for the complete 
null hypothesis. It 


is based upon 
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“‘layers’”’ and uses the distribution of 
the range of ‘‘studentized’’ means as 
a test. The level of the test is kept at 
the same nominal value throughout. 
As a result, the error rate can rise to 
ma/2 (m being the number of means, 
a being the nominal significance 
level), when the means are equal in 
pairs. Tukey’s method discussed be- 
low is a modification which corrects 
this difficulty. 

Duncan's procedures. Duncan 
(1955) has presented tables for two 
different layer procedures. One is 
based upon an F test for each sub- 
group, and the other based upon the 
range. In both cases, however, the 
error rate is set at a per degree of 
freedom, so that (m—1)qa is the error 
rate experimentwise. We have agreed 
with Tukey, however, in considering 
this compromise basis of controlling 
error as still open to objection. Those 
who prefer this approach will find 
Duncan’s tables useful. 

Bechhofer’s method. Bechhofer 
(1954; Bechhofer, Dunnett, & Sobel, 
1954; Bechhofer & Sobel, 1953) has 
considered a problem related to that 
of multiple comparison, but one 
which is formulated in a quite dif- 
ferent way. Instead of testing the 
null hypothesis that all means are 
equal, he assumes a situation in 
which we already know that there 
must be differences among the means. 
The problem is not one of testing 
significance, nor of setting confidence 
limits, but of finding the relative or- 
der of the means with a_predeter- 
mined probability of being correct, or 
of choosing only the highest of the 
means, the first and second highest, 
etc. Since Bechhofer is dealing with 
such a different problem from the one 
we have been considering, we shall 
not attempt to describe the method 
here. Those who are concerned with 
this kind of problem in psychological 
work are referred to Bechhofer’s 
papers. 


Tukey's methods. As we have 
stated above, Tukey (see Footnote 
2) has made the most detailed and 
thorough analysis of the problem of 
multiple comparisons. It is unfor- 
tunate that his paper is not generally 
available and has not yet been pub- 
lished. If it had been, the present ac- 
count could be much shorter. It will 
be remembered that Tukey favors 
an approach based upon confidence 
limits, rather than statements of sig- 
nificance, but he has presented meth- 
ods for both in his paper. Using the 
procedures he has developed or 
adapted we can: (a) set simultaneous 
confidence limits for all differences 
among means in a one-way analysis 
of variance design at the level @ ex- 
perimentwise; (b) set simultaneous 
confidence limits for all comparisons 
among groups of means and all linear 
functions of the means (e.g., 2M; 
+M2—3Ms3) (‘“‘contrast’’ and “uni- 
versal’’ allowances); (c) set simul- 
taneous confidence limits for all com- 
parisons or contrasts in a two-way 
analysis using a variablewise (‘‘fam- 
ilywise”’) error rate. That is, the 
error rate is a for each dimension of 
the analysis; (d) set confidence limits 
for interactions; (e) make simul- 
taneous significance tests on any of 
these. 

In addition, he has developed three 
different methods of calculation 
based (a) on variances and ¢ ratios of 
the ordinary sort, (6) using short cuts 
based upon ranges instead of calcu- 
lating standard errors, and (c) an in- 
termediate method which he calls 
the “‘half-cut”’ procedure. The tables 
for the short-cut method at the 5% 
level for both one-way and two-way 
analysis of variance, have been made 
available to psychological researchers 
by Mosteller and Bush (1954, pp. 
304-307). Tukey has also analyzed 
the effects of non-normality of the 
populations upon these procedures, 
concluding that the short cut pro- 
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cedures are most sensitive to the ef- 
fects of non-normality and the longer 
procedures are safest if there are 
doubts about the normality of the 
populations. Unfortunately, Mostel- 
ler and Bush’s brief presentation does 
not make clear that the short cut is 
sensitive to non-normality, nor do 
they make explicit the issue of error 
rates. The reader may not realize 
that, in two-way analysis, the error 
rate is 5% familywise or 10% experi- 
mentw?se. 

In an appendix to this paper, we 
shall outline the specific computa- 
tions in applying Tukey’s method to 
multiple comparisons of means. Here 
we shall only indicate the general 
principle involved. We already have 
available the probability distribution 
of the range of m means, based upon 
an estimated standard error of the 
mean (the tables of the ‘‘studentized”’ 
range [Pearson & Hartley, 1954]). 
The upper 5% point of this distribu- 
tion is the range which is exceeded 


by only 5% of samples of m means 
under the complete null hypothesis. 
Therefore 95% of the time none of 
the differences among the means in 
the group will be greater than the 
value given at the “5% point.’’ We 


therefore take as our confidence 
range+this 5% value for all of the 
differences in the group. The value 
is added to and subtracted from each 
of the observed differences, and 
Tukey shows that the probability is 
95% that all of these confidence 
ranges will include the population 
values, whether the true differences 
are zero or any other value, i.e., 
whether the complete null hypothesis 
holds or not. By applying suitable 
factors to these same “allowances” 
they can be used for setting confi- 
dence limits for any linear combina- 
tion of means, for interactions, and 
so on. 

In setting confidence limits, the 
same allowance is applied to all dif- 


ferences. If we wish to test signifi- 
cance, however, Tukey finds it pos- 
sible to reduce the allowance as we 
deal with means which are closer and 
closer together in the observed series, 
without reducing the experimentwise 
error rate. For example, the range of 
a subgroup of four adjacent means is 
tested by the mean value of two 5% 
points; (a) that for the range of four 
isolated means, and (6) that for the 
whole m means of the total group. It 
is a compromise between the allow- 
ances used by Newman and Keuls 
and those used for the confidence 
limits. 

Scheffé’s contrast allowances. This 
method (Scheffé, 1953) is similar to 
Tukey’s in its application, but it is 
based upon the F distribution rather 
than the range. For any given value 
of F there is a maximum difference 
which can occur between any pair of 
means. If we find this difference for 
the value of F at (say) the 5% point 
of F, then no difference can exceed 
this critical value unless F is in the 
extreme 5% area of its distribution. 
Therefore this difference will be ex- 
ceeded not more than 5% of the 
time. Similarly, there is a maximum 
value which can be attained by any 
particular contrast among the means 
(e.g., 3M: +Mo+M;—5Msg), includ- 
ing means of groups of means, 
weighted sums of groups of means in 
any possible combination, and so on. 
These values are used to set confi- 
dence limits for each contrast, with a 
specified experimentwise error rate. 

The allowances for simple differ- 
ences obtained by the Scheffé method 
are larger than those obtained from 
Tukey's method based upon ranges. 
For the ordinary multiple compari- 
problem, Tukey’s method is 
therefore more powerful or more sen- 
sitive. For contrasts involving sev- 
eral means, especially where we wish 
to divide the whole set of means into 
two groups and study the difference 


son 
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of the two groups, Scheffé’s method 
is the more sensitive. Where we are 
interested solely or primarily in sim- 
ple differences between pairs of 
means, we would not choose Scheffé’s 
method. (Note that we must chopse 
between the methods for any par- 
ticular experiment, we cannot use 
both.) Tukey’s analysis of the prob- 
lem includes a strong case for using 
the simple comparisons as the pri- 
mary basis for evaluating a method. 
There are serious logical difficulties 
involved in comparing groups of 
means or other contrasts, difficulties 
which would require too much space 
to specify here. 

Some other procedures. One special 
case of multiple comparisons occurs 
when one group is a control and we 
wish to compare all other groups 
with this one control. Dunnett 
(1955) has treated this case and has 
presented tables for controlling the 
experimentwise error rate. 

Another problem 


occurring fre- 


quently is the comparison of frequen- 
cies of occurrence or proportions in 
multiple classes. Relatively little has 
been done on this problem, except 
for the special case of choosing the 
class with the highest 
The latter case is discussed by Kozel- 


proportion. 


ka (1956). The general multiple 
comparison problem for proportions, 
where we wish to compare the pro- 
portion in each class with each other 
class, can be solved readily if we wish 
to work with an error rate per experi- 
ment. We merely apply ordinary two- 
sample procedures to each pair using 
a probability level of a@ divided by 
the total number of comparisons 
(m(m—1)/2). Then the error rate 
per experiment will be a and the ex- 
perimentwise error rate cannot be 
larger than a. How much smaller the 
error rate experimentwise would be 
still remains to be determined. 

Other cases of multiple tests. As 


noted at the beginning there are 
other situations in which a number of 
statistical tests are made upon one 
set of experimental results. While 
we have had space to discuss at 
length only the problem of multiple 
comparisons, we must emphasize that 
the same fundamental issues are in- 
volved in the other cases as well. 

In all of the cases there is the ques- 
tion of basing the error rate upon the 
individual comparison (as is fre- 
quently done in the literature) or to 
consider the error rate in relation to 
the experiment as the unit. Conclu- 
sions upon these other cases will not 
necessarily be the same as for multi- 
ple comparisons, since the purpose of 
the statistical analysis is different in 
each situation. Each of the cases re- 
quires an analysis similar to the one 
which we have made for the case of 
multiple comparisons, and, as yet, 
little has been done on most of them. 

As an example of the problems in- 
volved, we shall consider briefly just 
one of the other cases—that of mul- 
tiple F tests in a factorial experiment 
(Case 2 on p. 27). Hartley (1955) 
has described a method for control- 
ling the experimentwise rate of error 
in a multivariable analysis of vari- 
ance. By this method it is possible to 
test each source of variance in such a 
way that there is a specified proba- 
bility that there will be one or more 
incorrect conclusions in the total ex- 
periment. Hartley does not go into 
detail, however, as to the problem of 
deciding when the experimentwise 
rate should be used. 

The present writer believes that 
the same arguments which support 
the experiment-based error rates for 
multiple comparisons would also ap- 
ply to multiple F tests. In the mul- 
tiple comparison situation the exper- 
imenter can increase the probability 
of finding some (erroneously) signifi- 
cant results by studying more and 
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more levels of the variable in the ex- 
periment and by basing his signifi- 
cance tests on the single comparison 
rate. In the same way, one can in- 
crease the probability of finding some 
significant F ratios in an experiment 
by complicating the experiment with 
more and more irrelevant variables, 
while continuing to base the error 
rate upon the individual F. For ex- 
ample, in a factorial design with five 
variables there could be as many as 
31 F ratios. If each were tested at the 
standard ‘*.05 level’ the probability 
that some of them would turn out to 
be significant is almost .80, under the 
null hypothesis. 


SUMMARY 


We have considered several basic 
issues involved in multiple compari- 
sons. Our present position on these 
problems is as follows: 

1. In general the same procedures 
should be used, whether the direction 
of differences has been predicted in 
advance or not. The same procedures 
which apply to comparisons sug- 
gested by the data, should be 
plied when 
been specified in advance. 

2. In general, the experiment 
should be used as the unit in 


ap- 
the comparisons have 


com- 
puting error rates, rather than the in- 
dividual comparison or test. 

3. Following Tukey’s lead, the er- 
ror rate should be determined on the 
basis of that null hypothesis which 
maximizes the rate. 

4. The error rate per experiment is 
an upper limit for the error rate ex- 
perimentwise, and therefore provides 
a conservative test which can be used 
when the experimentwise rate can- 
not be computed. 

5. The choice between the two ex- 
periment-based error rates is usually 
one of convenience, since they differ 
but little numerically in the 


cases 


where both procedures are available. 

6. The relative advantages of con- 
fidence limits vs. significance tests 
have not been treated in this discus- 
sion, but it is pointed out that the 
two methods do not lead to parallel 
conclusions in the case of multiple 
comparisons. 

7. Several of the available methods 
for multiple comparison are reviewed 
briefly. 


APPENDIX 


TuKEY’s METHOD FOR MUL- 
TIPLE COMPARISONS® 


We present here a brief set of instructions 
for Tukey’s method for comparing individual 
means, making use of the tables of the “stu- 
dentized range."’ These tables, as published 
in Pearson and Hartley (1954, pp. 176-177) 
permit the comparison of up to 20 means in a 
group at either the 5% or 1% level experi- 
mentwise. Tukey has developed a_ table 
covering larger groups at the 5% level (see 
Footnote 2) but it has not yet been published. 


The following symbols are used throughout: 

s The standard error of any of the individual 
means. In ordinary analysis of variance 
this is the square root of ‘‘mean square 
for error” divided by \/a, where a is the 
number of cases upon which each mean is 
based. We assume thac all groups are of 
equal size. 

v Degrees of freedom in the determination 
of s 

SR Percentage point of the studentized 
range as read from the table. 

WSD (Tukey's abbreviation for ‘wholly 
significant difference’) the final allowance 
used in establishing confidence limits or 
determining significance of individual 
comparisons WSD=SR-s 

n Number of 


compared.’ 


means in the group being 


6 See Footnote 2. 

7 Unfortunately, symbols appropriate to the 
tables in Pearson and Hartley may be confus- 
ing becaue they fail to correspond to com- 
mon practice in the psychological literature. 
The reason is that the tables are set up in 
terms of the range of individual values rather 
than a range of group means, and we must 
make appropriate translations of terms. 
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A. CONFIDENCE LIMITS FOR 
ALL COMPARISONS 


1. Determine s 

2. Determine SR by reading the table of the 
studentized range for the appropriate confi- 
dence level (5% or 1%), degrees of freedom 
(v), and m. Note that SR is the upper 5% or 
1% point. 

3. Find WSD by multiplying SR by s. 

4. Adding and subtracting WSD for any 
given difference between a pair of means will 
give the appropriate confidence limits for the 
difference of that particular pair, with the 
error rate controlled experimentwise. 

B. SIGNIFICANCE TESTS 
Definitions 

For any given pair of means, let k be the 
number of means in the subgroup including 
the two means, i.e., two plus the number of 
means between them. For example, in the 
series: 8, 19, 22, 23, 27, 28 when we test the 
difference between 19 (M2) and 27 (Ms), & 
is 4. 

In the following steps ‘‘testing a given pair 
of means” will mean the following: (a) Find 
SR corresponding to m, the total number of 
means being compared. (b) Find the value of 
SR corresponding to k for that group (i.e. 
reading the table for & in place of m). (c) Find 
the mean of these two values of SR. (d) Find 
the mean WSD by multiplying s by the mean 
SR. (e) The difference between the pair of 
means is considered significant if it is greater 
than this mean WSD, but there are also re- 
strictions on the order of testing to be de- 
scribed below. 


Procedure: 


1. Determine s (see definition). 


2. Arrange the means in order of magni- 
tude. 

3. Test the difference between the extreme 
values, using the WSD for the total number 
of cases. (For the extreme values (k=n.) If 
the extreme means are not significantly differ- 
ent, no further tests are made, and we con- 
clude that there are no significant differences 
in the group. 


If the extremes are significantly different: 

4. Test each extreme mean against the 
mean next to the other end of the array, using 
the mean WSD for k =n —1. If neither of these 
tests is significant, we stop with the conclu- 
sion that only the extremes of the group differ 
significantly. 


If either or both of the tests are significant: 


5. Test all subgroups with k=n—2. Con- 
tinue until all subgroups of a given sizeare 
found to be nonsignificant. 


Note that in testing the differences by 
layers as described in the procedure above, a 
difference cannot be significant unless the 
particular pair of means is also surrounded 
by another pair which have been found to 
differ significantly. To begin with, none of the 
differences are significant unless the extreme 
values differ significantly. In general, any 
particular significant pair must belong to a 
larger group of which the extremes are sig- 
nificantly different. We may illustrate with a 
pair which is adjacent and near the middle of 
the range, say M; and Mg (i.e., 22 and 23 in 
the example above). Before M; can differ 
significantly from Mg, either M; must be found 
to be significantly different from Ms, or else 
M, must differ significantly from M2. Each 
of these pairs depends in turn upon the sig- 
nificance of means which have two means be- 
tween them, and so on. 
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Emphasis on the complexity of be- 
havior has led to the development of 
standardized test batteries which 
provide several scores at the same 
time—i.e., ‘‘profiles.’’ Although such 
instruments may have many advant- 
ages over those which yield only a 
single score, they also introduce a 
number of statistical problems which 
complicate the analysis of the data 
they provide. 

These statistical problems arise in 
part simply because each individual 
contributes a set of scores which are 
not statistically independent meas- 
ures. Our present difficulty in han- 
dling sets of scores from individuals is 
evidenced by the variety of methods 
which have been proposed recently 
for appraising the ‘“‘pattern,”’ “‘level,” 
and “‘scatter’’ of profiles of individ- 
uals or groups of individuals, and by 
the lack of consensus among such 
methods. Data of this type can, of 
course, be analyzed effectively by 
multivariate techniques. However, 
since multivariate analysis is beyond 
the scope of many research workers 
not trained in advanced statistics, we 
are still faced with the need for use- 
ful methods for working with data 
which are essentially multivariate. 

The purpose of this study is to ex- 
amine and compare several methods 
for dealing with two questions which 
usually arise when one works with 
profiles—namely, the formation of 

1 This investigation was supported by re- 
search grant M-637 from the National Insti- 
tute of Mental Health, Public Health Service. 
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groups and some criterion for decid- 
ing when a “group” exists. By and 
large, we will restrict our discussion 
to cases where the investigator has 
some rationale for grouping the pro- 
files and wishes to estimate the homo- 
geneity of the group and the proper 
membership of the individuals in it. 

The data for this study are 12 Min- 
nesota Multiphasic Personality In- 
ventory profiles (nine clinical scales) 
from individuals who were tested in a 
clinic setting. These data were se- 
lected because, psychometrically, the 
MMPI is similar to many other tests 
in so far as it claims to have the fol- 
lowing properties: (a) The scales 
measure several basic aspects of be- 
havior, e.g., attitudes, values, or per- 
sonality dimensions, (b) each of the 
scales is standardized on rather large 
referent populations, and (c) al- 
though each scale is somewhat in- 
dependent, no one scale can be in- 
terpreted properly apart from the 
others. 

These MMPI data were selected 
also because independent clinical di- 
agnoses were available for each of the 
12 individuals, whe fell into the fol- 
lowing clinical groups: three ‘‘hy- 
pertensives,”’ four ‘‘neurotics,’’ and 
five ‘‘psychotics.’’ In the discussion 
to follow, we will utilize these diag- 
noses. The standardized MMPI 
scores for these Ss are given in Table 
13 

2We are grateful to Kobert E. Harris, 


Langley Porter Clinic, San Francisco, for pro- 
viding us with the MMPI profiles and clinical 
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choose among the several factor an- 
alytic methods now available to form 
groups of profiles. His choice will de- 
pend, among other things, upon: his 
personal inclination to rely on the 
methods themselves or upon his own 
insights and judgments, the degree of 
methodological elegance desired and 
its status value to him, the amount of 
information he has regarding possible 
groupings, and various “practical” 
considerations such as how much 
time, money, machinery, and assist- 
ance he has to get the job done. 

In addition to the orthogonal cen- 
troid method (Thurstone, 1941), we 
have selected the oblimax rotational 
solution (Pinzka & Saunders, 1954) 
and the multiple group method 
(Holzinger & Harman, 1941; Thur- 
stone, 1941) for illustrative pur- 
poses. The latter two methods differ 
greatly in that the oblimax method 
leaves the investigator free from hav- 
ing to make any subjective decisions 
after the data have been fed into an 
electronic computer, whereas the 
multiple group method requires the 
investigator to form tentative group- 
ings before beginning the analysis. 

The results obtained by these fac- 
tor analytic methods are as follows: 

The orthogonal centroid method.— 
Since the “‘factors’’ or profile groups 
obtained by this method are uncorre- 
lated, this method is useful when the 
profile groups actually are independ- 
ent, when not more than one clear- 
cut group exists, or when the inves- 
tigator is interested in the one group 
which is most representative of the 
total set of profiles. In the data in 
Table 1, however, the profiles com- 
prising Groups I and II are somewhat 


* The communalities were estimated by 20 
iterative approximations according to the 
method developed by Dickman for the 
ILLIAC, with the result that the mean abso- 
lute residual for the 12 communalities was only 
.002. 


alike in pattern, but they differ 
clearly from those in Group III. 
(These relationships are also appar- 
ent from the ‘“‘correlations among the 
factors” in Table 2.) In cases where 
more than one nonorthogonal group 
exists, the centroid method by itself 
provides information to select the 
profiles with the highest loadings on 
the first “factor’’ only. Since we 
have reason to expect that more than 
one group exists ainong the 12 pro- 
files in Table 1, and that the groups 
are nonorthogonal, some method for 
obtaining an oblique solution seems 
appropriate. The following method 
was used to provide an objectively 
obtained oblique solution. 

The centroid method with an oblimax 
solution.—This method enables the 
investigator to rely on the objective- 
ly obtained solution to provide the 
groupings which most closely ap- 
proximate simple structure. Al- 
though the oblimax method requires 
an electronic computer, it has certain 
obvious advantages, especially when 
the investigator has no idea at all of 
how the profiles should be grouped. 
The formation of three groups among 
the 12 profiles is indicated by the fac- 
tor loadings and the correlations 
among the factors, which are given as 
Set A of Table 2. 

The multiple group method.—The 
three clinical diagnoses were used to 
form the tentative groupings for the 
multiple group method in this study. 
This method is a good deal less com- 
plicated and time consuming than 
the centroid method although, if 
tentative profile groupings can be 
formed which are reasonably well 
constituted, the results of these two 
methods do not differ appreciably.‘ 


4 In the case of faulty initial grouping, re- 
allocation of profiles to their proper group 
generally can be made from the results of the 
multiple group solution. 
Question 4 below. 


See discussion of 
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: Set A 
Centroid Solution, Mult 
Oblimax Rotation N 
Part I: Factor Loadings and Correlations 
Groups ( 
Ss 
I II Ill I 
Group I 1 44 .12 07 90 
2 77 36 01 85 
3 77 18 07 1.00 
Group II 4 .07 94 02 56 
5 —.12 | 1.05 05 27 
6 i5 79 —.10 45 
7 13 0 04 52 
Group III 8 23 13 32 40 
9 21 Os 24 83 
10 —.41 |—.13 4, 84 
11 .00 24 30 62 
12 09 |—.25 28 72 
Part II: Correlations among the ‘‘Factors"’ or Groups 
II III 
Group I 55 31 
07 


Group II 


That is to say, the relation between 
the results of these two methods can 
be seen by the use of a transforma- 
tion matrix (Holzinger & Harman, 
1941), which gives the relationships 
among the coordinates of the two 
systems, from which can be obtained 
a matrix containing the cosines of the 
angles (often interpreted as a func- 
tion of the correlation) between the 
two sets of factors. From the trans- 
formation matrix, which is given in 
Table 3, we can assume that the 
centroid factors could have been ro- 
tated so as to give results essentially 
the same as those obtained by the 
multiple group method, which are 
given in Set B of Table 2. 

A comparison of the oblimax and 
multiple group methods reveals that, 
although any decision based on the 
italicized values in Table 2 would re- 
sult in the same choice as to which 
profiles belong together to form the 
three groups, the factor loadings dif- 
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.49 |-—.73 93 45 72 .&9 07 82 
.18 70 .90 .15 |—.70 85 .13 |—.74 
68 59 97 66 .58 95 .40 — .68 
1.00 |—.21 .57 .98 |—.19 -12 89 -.13 
95 07 28 96 0s | —.08 76 .14 
79 34 .44 8&2 |—.31 .39 59 |—.43 
88 19 oe 93 |—.18 .52 79 | —.32 
O8 8] -.38 11 86 —.61 01 86 
22 99 |—.81 17 .98 |—.88 |—.32 88 
37 96 82 33 .96 |—.80 26 .90 
0s 96 | —.60 .09 97 |—.71 |—.14 73 
-.39 91 |—.70 | —.34 91 |—.66 |—.38 35 
II Ill II Ill II Ill 
49 74 47 71 25 78 
18 12 —.13 


fer appreciably for the two methods. 
However, it cannot be expected that 
these two methods will give the same 
numerical results, since the factor 
loadings for the oblimax method are 
in terms of projections on oblique 
axes in oblique space, whereas the 
multiple group—or the rotated cen- 
troid—loadings are in terms of pro- 
jections on oblique axes in orthog- 
nal (i.e., codrdinate) space. 

Direct correlation method.—A new 
approach to this problem of deter- 


TABLE 3 


RELATION OF Facrors OBTAINED FROM 
CENTROID FACTORING TO THOSE OF 
MULTIPLE Group METHOD* 





I II III 
I .996 497 — .135 
Il 1.000 — .187 
1.003 


II] 





* Compare also with the correlation among the fac- 
tors given in Table 2, Set B. 
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mining profile groupings can be illus- 
trated with these same data. The 
rationale for this approach is based 
on the following assumptions: (a) if 
the mean of a distribution is the 
measure which best represents all of 
the scores in that distribution, then 
for a given set of k profiles, the mean 
of the k scores for a given subtest 
should be the measure which best 
represents all of the k subtest scores; 
and (6) generalizing over the c sub- 
tests, the mean score for each sub- 
test should form the profile which 
best represents all of the & profiles 
(individuals) in the set. For example, 
in Table 1, Group I, the three indi- 
vidual profile scores for the Hs sub- 
test are averaged to obtain the mean 
ITs score of 71.67. When this pro- 
cedure is followed for each of the nine 
subtests, the profile thus formed will 
be called an empirical criterion pro- 
file. 


As with the multiple group meth- 
od, let us assume (at least tentative- 
ly) that the three clinical diagnoses 
provide a reasonable basis for group- 


ing the 12 profiles. And, if we as- 
sume that three meaningful groups 
exist among the 12 profiles, we can 
readily obtain three empirical cri- 
terion profiles, one for each group 
(cf. italicized values in Table 1). 
Given the obtained criterion pro- 
files, we can then compute product- 
moment correlations (rs) between 
each of the 12 individual profiles and 
each of the three criterion profiles. 
For example, in Table 1, the r be- 
tween CP; and the profile for S 1 is 
.93 and the r between CPy; and the 
profile for S 1 is .45, and so on. Fol- 
lowing this procedure we obtain the 
findings given under Set C of Table 
2.5 It will be seen that these rs are 


5 It should be noted that the applicability 
of this procedure is not restricted to test bat- 
teries such as the MMPI. If the investigator 
has collected a set of measures on a reasonably 


rather similar to the factor loadings 
given in Set B, which are called the 
“factor pattern.’’ Furthermore, if we 
intercorrelate the three criterion pro- 
files, we find that the relations among 
these profiles, as indicated by the rs, 
is similar to the relations among the 
factors given in Set B, which are 
called the ‘‘factor structure.” 

From an inspection of the values in 
Sets B and C, it is apparent that the 
findings from these two methods 
agree closely. Of the 36 compari- 
sons between the factor loadings and 
the rs, we find the largest discrepan- 
cy to be approximately .05, with a 
mean absolute difference of approxi- 
mately .02. (We have already seen 
from Table 3 that a rotation of the 
centroid factors by Thurstone’s tech- 
nique would give results which are 
essentially the same as those in Set 
B.) Although the differences between 
the findings for Sets B and C are un- 
systematic and small, we are not pre- 
pared to argue that either set is “‘cor- 
rect’ and is approximated by the 
other. Rather, it seems sufficient at 
this time to say only that these meth- 
ods give results which are prac- 
tically identical. 

The direct method of correlating 
the individual profiles with the em- 
pirical criterion profiles avoids a 
number of disadvantages inherent in 
the factor analytic techniques. Dif- 
ferences in the relative ease of com- 


large number of individuals, he can readily 
transform these raw data into standardized 
scores so that the measures (i.e., scale scores) 
have equal means and equal sigmas before he 
forms groups of profiles. Furthermore, this 
procedure can be used to answer a variety of 
questions: for example, if the investigator 
wished merely to select the one profile which 
is most representative of a particular group, 
he would determine which individual profile 
correlates highest with the criterion profile. 
Thus, in Set C of Table 2, Profiles 3, 4, and 9 
are most representative of their respective 
groups, 
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putation are obvious, since several 
time-consuming operations are side- 
stepped by the direct correlation 
method. Specifically, this method 
does not require the calculation of an 
initial correlation matrix, the esti- 
mation of communalities (cf. Wrig- 
ley, 1957), the factorization of the 
matrix, and the probable need to ro- 
tate the initial solution to obtain 
simple structure. (As with the mul- 
tiple group method, when the initial 
groupings are well constituted, the 
results will approximate simple struc- 
ture.) On the other hand, the use of 
a criterion profile has an important 
advantage—it is intuitively mean- 
ingful since it shows which scales 
tend to have high or low scores for 
the group in question, and hence may 
be thought of as the ‘‘definition of a 
factor.’’ Such scalar values are not 
available when factor analytic tech- 
niques are used. 


2. What Aspects of the Profiles Should 
be Considered in Forming Groups? 
In computing the correlations giv- 

en in Set C of Table 2, the product- 

moment or interclass coefficient (r) 

was used. This was done to facilitate 

the comparison of Sets B and C in 


Table 2 since 7, or some approxima- 
tion of it, characteristically has been 
used in factor analytic techniques 
This is not to say, however, that 7 is 
the most appropriate measure to use 
when one is interested in forming 
groups of profiles based on standard- 


ized test batteries. The bivariate 
statistic r always equates the two 
variates being correlated by reduc- 
ing them to deviation scores, so that 
only a measure of the similarity of 
the paired standard scores (some- 
times called pattern) is reflected in r. 
However, it is inevitable that any in- 
formation regarding differences in 
profile means (sometimes called level) 
and profile sigmas (sometimes called 


scatter), like “poor Clementine,” is 
lost and gone forever when r is used. 

In forming groups of profiles, the 
coefficient of intraclass correlation (R) 
can be used effectively to reflect any 
meaningful differences that might 
exist among the profile means and/or 
sigmas (Haggard, 1958). That is to 
say, the statistic R enables the in- 
vestigator to consider or to ignore 
these differences in terms of the prop- 
erties of the particular test battery 
and his research questions when 
working with profiles from standard- 
ized tests. More specifically, if he 
wishes to equalize (i.e., disregard dif- 
ferences in) the profile means, he can 
add the appropriate constant to the 
scores in each profile and/or if he 
wishes to equalize the profile sigmas, 
he can divide the scores in each pro- 
file by its standard deviation before 
computing R.6 With these possibili- 
ties for adjusting the profile scores, 
one can obtain four meaningful sets 
of intraclass correlations between the 
individual and the criterion profiles. 
The four possibilities, which can be 
compared with the findings reported 
in Sets B and C of Table 2, are as fol- 
lows: (a) the means and the sigmas 
of the profiles are equalized (this 
method permits grouping profiles in 
terms of their pattern only), (6) the 
profile means are equalized but the 
sigmas are allowed to vary (permits 
grouping profiles in terms of their 


6 In this type of problem we can assume a 
one-way analysis of variance design with c 
classes (subtests) and & replications (profiles). 
The R can then be computed from the mean 
squares of the analysis of variance table by 
using the formula: R=(BCMS—WMS) 

(BCMS+|k—1]WMS), where BCMS is the 
between classes mean square and WMS is the 
residual or within classes mean square. In 
computing R for the individual vs. criterion 
profile correlation, c=9 and k=2. However, 
R can be computed for k of any size; Rs over 
the profiles in the three groups (k =3, 4, and 
5) and for all three groups together (k =12) 
are given in Table 4. 
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pattern and scatter), (c) the means 
are allowed to vary but the sigmas 
are equalized (permits grouping pro- 
files in terms of their pattern and 
level), (d) the means and the sigmas 
are allowed to vary (permits group- 
ing profiles in terms of their pattern, 
level, and scatter). 

Although it would be possible to 
compare all the Rs between the indi- 
vidual and criterion profiles for each 
of these four methods, it will be suffi- 
cient for illustrative purposes to 
compare only methods (a) and (d). 

When profile means and sigmas 
are equalized (a above).—Under these 
conditions, R is reduced to r, so that 
R=r (Haggard, 1958). Consequent- 
ly, the findings in Set C of Table 1 
can be thought of as intraclass corre- 
lations computed on the profiles with 
equalized means and sigmas. 

When profile means and sigmas are 
allowed to differ (d above).—The pos- 


sible effect of equating the profile 
means and sigmas in forming groups 


can be seen by comparing the find- 
ings reported in Sets C and D of Ta- 
ble 2. In each instance, the italicized 
values are reduced in Set D, indicat- 
ing that the measure of correspon- 
dence of the profiles in each of the 
three groups is decreased when dif- 
ferences in the profile means and sig- 
mas are taken into account. This al- 
ways occurs when the profile means 
and/or sigmas differ. It is apparent 
also that this decrease varies from 
profile to profile. In some instances, 
such as with S 8 in Group III, the 
drop in R is only from .862 to .857, 
but S 12 in this same group shows a 
drop from .910 to .354. (The reason 
for the difference in the size of these 
Rs can be seen from an inspection of 
the means and sigmas of the indi- 
vidual and criterion profiles given in 
Table 1.) In connection with these 
findings, if an investigator decided to 


drop the least similar profile from 
Group III, we would expect him to 
drop either No. 8 or No. 12, depend- 
ing on whether he relied on the find- 
ings in Set C or those in Set D. 

From the above results it seems 
apparent that, if one assumes dif- 
ference in profile means and sigmas 
to be important aspects of the pro- 
files to be groupd, all of the correla- 
tions in Set C are too high and some 
of them may be quite misleading. It 
also follows that, under the above as- 
sumption, the familiar factor analytic 
techniques which utilize r will yield 
results which suffer the same short- 
comings when used to group indi- 
viduals on the basis of profiles from 
standardized test batteries. The co- 
efficient R is a more general and flex- 
ible measure of correlation with this 
type of data and should be used 
when the investigator is interested in 
possible differences in the profile 
means and/or sigmas. 

An additional question that may 
arise in grouping profiles has to do 
with the degree of homogeneity of a 
group of profiles taken together. In- 
traclass correlation can be used as a 
general descriptive measure to indi- 
cate group homogeneity since it can 
be computed readily with & of any 
size (see Footnote 6). To illustrate 
this possibility, the three profiles in 
Group I, the four profiles in Group 
II, and the five profiles in Group III 
were correlated both under condi- 
tions where the profile means and 
sigmas were equalized and where 
both were allowed to vary. The re- 
sults are given in Table 4. One can, 
of course, use Rs computed on a set 
of k profiles as a general criterion or 
distance measure, such as by requir- 
ing that R reach some specified value 
(e.g., .70 or .85), before consider- 
ing a collection of profiles to be a 
“group.” 
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TABLE 4 


INTRACLASS CORRELATIONS FOR GROUPS OF PROFILES 


Group II Group III 
(k =4) (k=5) 


Group I 
(k =3) 


Total* 
(k =12) 





R (profile means and sigmas are equalized, 
i.e., R=r)> 


R (profile means and sigmas not equalized, 
ie., R¥r) 





® The Rs of .124 and .057 indicate the over-all agreeme 
a¢tor exists in the total set of data 


When k >2, R =7, the average of the k(k—1) possible 


computing R (cf. Haggard, 1958). 


3. What Types of Criterion Profiles 
May Be Used in Forming Groups? 
By using either factor analytic 

techniques or empirical criterion pro- 
fies, one’s findings are necessarily re- 
stricted to the members of the par- 
ticular sample studied, and the best 
possible results of such methods 
would provide one or more groups of 
profiles which are most like each 
other in that sample. The findings of 
such methods are, of course, influ- 
enced by various sampling artifacts, 
since the groups are formed only on 
the basis of the profiles which are in- 
cluded in the sample. 

In various research situations an in- 
vestigator may not wish to be depend- 
ent upon the profiles in a particular 
sample in order to form groups of in- 
dividuals. For example, he may wish 
to form groups in terms of one or 
more a priori “ideal’’ profiles which 
are based on theoretical considera- 
tions (cf., e.g. Abel et al., 1956). Or, 
in order to repeat a previous study, 
he may wish to form groups of indi- 
viduals with profiles as similar as pos- 
sible to those of groups studied previ- 
ously by himself or by others. With 
the procedures discussed thus far, it 
is not possible to form groups around 
such a priori profiles. 

The flexibility of the direct corre- 
lation method can be extended to uti- 


.809 .804 .843 .124 


.057 


nt among all 12 profiles—i.e., the degree to which a general 


rs when the profile means and sigmas are equalized before 


lize a priori criterion profiles. After 
such profiles are defined, any of the 
four methods of obtaining intraclass 
correlations discussed under Ques- 
tion 2 can be used to screen individ- 
ual profiles in order to select those 
which correlate most highly with the 
a priori criterion profile(s). The for- 
mation of a group would then depend 
upon the number of profiles and the 
degree of over-all homogeneity re- 
quired for the group or groups. 


4. How Can One Determine the Group- 

Membership of Individual Profiles? 

In view of the findings given in 
Table 2, the grouping based on clin- 
ical diagnoses adequately partitioned 
the 12 profiles in the illustrative data 
which we have cited. But it is not to 
be expected that the groupings will 
always ‘‘come out right” the first 
time. If they do not, the investigator 
is faced with the problem of reassign- 
ing improperly classified profiles to 
their proper group. 

With either the multiple group or 
direct correlation methods, informa- 
tion of the type given in Table 2 gen- 
erally will indicate the appropriate 
group membership of an individual 
profile if the tentative grouping is 
reasonably correct. That is to say, 
these two methods can be used ef- 
fectively when the investigator is not 
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completely ignorant of how the pro- 
files should be grouped, but wishes to 
determine the appropriateness of his 
tentative grouping and to correct for 
any misgrouping. 

This case can be illustrated by the 
following example: Let us assume 
that profile No. 6 had been placed 
(i.e., misplaced) in Group I. For il- 
lustrative purposes this was done 
with the result that the factor load- 
ings and correlations for profile No. 
6 clearly deviate from those of Nos. 
1, 2, and 3 in Group I, and conform 
more nearly to those of Nos. 4, 5, and 
7 in Group II. Such findings indicate 
that No. 6 should be assigned to 
Group II. An additional effect of 
profile misgrouping is to inflate the 
correlations among the factors or cri- 
terion profiles. As the profiles in 
the different groups become max- 
imally homogeneous (i.e., optimally 
grouped), these correlations will ap- 
proach their minimum value; if the 
profiles were assigned at random to 
the groups, these correlations would 
approach +1. 

Questions regarding group mem- 
bership also arise when an investiga- 
tor has one or more new profiles and 


wishes to assign them to the appro- 


priate existing group. The most ef- 
ficient method for doing this is to cor- 
relate the individual profile(s) in 
question with the existing empirical 
or a priori criterion profile(s).7. The 
assignment of an individual profile to 
a group will depend in part upon 
which of the possible correlations and 
which type of criterion profile are 
used, as indicated in the discussion of 
profiles Nos. 8 and 12 under Question 
2 above. It is to be expected that the 
same criteria which are used to de- 
fine a group in the first place will 


7 With the factor analytic methods, the con- 
sideration of new profiles would require an en- 
larged correlation matrix and refactoring. 


apply when new profiles are added 
to it. 


SoME CONCLUDING REMARKS 


In this paper we have touched only 
lightly upon, or have bypassed com- 
pletely, two rather important issues. 
They deserve further comment. 

The first issue has to do with the 
fact that the product-moment co- 
efficient r, and more elaborate tech- 
niques based upon it, ignore the sca- 
lar values of the variables which are 
correlated. In the bivariate case, 
where two different scales of meas- 
urement are involved, it is not pos- 
sible to compare the relative position 
of the paired scores in their respec- 
tive distributions and at the same 
time consider the scalar values of the 
paired scores. In such cases r is an 
appropriate measure of correlation, 
but since r always converts the origi- 
nal variables into standard score 
units, it eliminates any informa- 
tion regarding the original scalar 
values. 

Frequently, however, test bat- 
teries which yield a set of scores from 
which profiles can be formed are 
standardized so that the two or 
more variables have a common scale 
of measurement, i.e., a common 
mean and a common sigma. Under 
these conditions, one may wish to 
take account of their scalar values. 
For example, many clinicians assert 
that information regarding profile 
level and scatter must be taken into 
account in evaluating the meaning of 
one or more profiles. In such cases, it 
is clear that the intraclass coefficient 
R, a univariate statistic, can be used. 
Furthermore, it can be shown that, 
in many instances, meaningful groups 
of profiles can be formed only when 
their scalar values are taken into ac- 
count. This point will be discussed in 
more detail in another paper. 

The second issue has to do with the 
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possible statistical significance of the 
coefficient of intraclass correlation. 
In this paper we have considered R 
only as a distance or descriptive 
measure, although the statistical sig- 
nificance and confidence limits of 
any R can be estimated under the 
proper conditions. But since each of 
the 12 individuals contributed nine 
measures in the profile data which 
we have used, it cannot be assumed 
that these nine measures and their 
standard errors are independent in the 
sense of being uncorrelated. In other 
words, we must that 
profile data are essentially multivari- 
ate, and, consequently, a univariate 
test of significance (e.g., F) cannot be 


assume these 
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applied to these data in their present 
form. 

It is possible, however, to convert 
profile data such as we have used by 
dividing each profile score by its 
standard error of estimate to obtain 
“stabilized scores.”’ It can be shown 
that stabilized scores have statistical 
properties which enable the investi- 
gator to analyze various aspects of a 
group of profiles, such as level and 
scatter differences or the degree of 
over-all homogeneity, and to deter- 
mine the statistical significance of his 
findings. The procedures for carry- 
ing out such pattern analytic studies 
have been presented elsewhere (Hag- 
gard, 1958). 
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The paradigm for SPC was estab- 
lished by Brogden (1939), as was 
the name ‘Sensory Preconditioning.”’ 
The procedure consists of the follow- 
ing three stages: (a) repeated con- 
tiguous unreinforced presentation of 
intersensory stimuli, (6) establishing 
a response to one of them, and (c) 
testing transfer of response to the 
other stimulus. Unfortunately, a 
control which is necessitated by this 
procedure has not been utilized in a 
number of experiments (Bahrick, 
1952; Brogden, 1939; Karn, 1947). 
That is, equal exposure to the test 
stimulus must be given to both ex- 
perimental and control groups. Lack- 
ing this control, the eventual differ- 
ence between the group initially pre- 
sented with paired stimuli and the 
control group could be attributed to 
differential familiarity with the test 
stimulus. Consequently, according to 
Reid (1952), these early studies by 
themselves are not conclusive. 

The present study is a review of 
the existing data in this area with an 
attempt at reconciliation of certain 
of the more apparent inconsistencies. 
A descriptive analysis of the experi- 
mental setting will be offered; and, 
later, research will be suggested to 
clarify the existing body of informa- 
tion concerning sensory precondi- 
tioning (SPC). Within this ap- 
proach the paper is directed toward 
a general consideration of three ques- 


1 The author gratefully acknowledges the 
criticisms and analyses offered by William A. 
Shaw, Ronald H. Forgus, and Howard 
Ranken during the preparation of this manu- 
script. 

2 Now at Research Directorate, Air Force 
Special Weapons Center, Kirtland Air Force 
Base, New Mexico. 
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tions: (a) Is sensory preconditioning, 
as substantiated by the existing data, 
a phenomenon to be dealt with by 
learning theory? (0) If so, what laws 
of learning does it follow? (c) What 
are some of the problems which the 
learning theorist faces in attempting 
to integrate the data of SPC into his 
system? 


THE EXPERIMENTAL EVIDENCE 


Animal studies. The initial experi- 
ment on ‘sensory preconditioning”’ 
was done by Brogden (1939). Eight 
experimental animals were presented 
with 200 pairings of a bell and a light. 
Secondly, one of these stimuli was 
used as a CS in a shock-avoidance 
setting until a criterion of avoidance 
was reached. During the test trials 
the other stimulus was presented and 
responses to extinction were re- 
corded. The control animals which 
had not been exposed to the precon- 
ditioning pairing gave significantly 
fewer Rs to the unreinforced stimu- 
lus. 

Subsequently, 


Reid (1952) per- 
formed anexperiment with 16 pigeons. 
He trained them in a modified Skin- 
ner box to peck for food reward at a 
signal. In the test situation, pecking 
Rs were counted to the other stimu- 
lus which was presented without re- 


ward. The design was modified so 
that the control and experimental 
groups were given equal amounts ol 
exposure to the buzzer and light dur- 
ing pretraining; however, for the 
Control Ss the stimuli were not 
paired but presented separately. 
With the Ss equated in this manner, 
no significant differences were ob- 
tained between experimental and 
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control groups in number of pecking 
Rs to the test stimulus (either buzzer 
or light). Reid does summarize re- 
sults of an unpublished study, using 
pigeons, which was done by Mac- 
Phersen in which Brogden’s original 
data were confirmed: however, this 
study did not equate amount of ex- 
posure to the test stimulus for the 
two groups. 

Bahrick (1952, 1953), using rats in 
an avoidance situation, obtained 
somewhat ambiguous results. He 
did find that high drive (14-hour food 
deprivation) during preconditioning 
led to greater positive transfer ef- 
fects than did low drive (satiated); 
but the confusing outcome was that 
the control group (under high D) 
showed positive transfer to as great 
a degree as the Low Drive experi- 
mental group (Bahrick, 1953). <A 
possible explanation for this occur- 
rence may lie in Bahrick’s use of the 
same apparatus for exposure and 
training. Perhaps there was a suf- 
ficient number of cues in the appara- 
tus other than the buzzer to mediate 
the transfer effect to the light. This 
possibility may plausibly account for 
Reid’s data, also. 

Clearcut positive results have been 
obtained recently in an avoidance 
training investigation by Silver and 
Meyer (1954). Unlike Bahrick, how- 
ever, they used an exposure appara- 
tus which was distinctly different 
from that used in the other two 
phases of the experiment. These au- 
thors, using rats, found no signifi- 
cant differences among three control 
groups, one of which had had pre- 
training to the test stimulus alone 
(light or buzzer), one to the training 
stimulus alone (buzzer or light), and 
one with no pretraining experience. 
Apparently, differéntial exposure to 
the test stimulus was unimportant. 
These findings were subsidiary to the 
main purpose which was to relate 


sensory preconditioning to classical 
conditioning by showing that the 
same optimal temporal relationships 
hold for connection of the intersen- 
sory stimuli to occur as for the CS- 
UCS in the Pavlovian paradigm. 
The three experimental situations in 
preconditioning were: simultaneous, 
forward (.5 second between simuli), 
and backward (.5 second between 
stimuli, but the training and test 
conditions were reversed). The re- 
sults partially support their hypothe- 
sis of similarity of the two proce- 
dures since ‘‘forward’’ sensory pre- 
conditioning resulted in greater posi- 
tive transfer effect in the test avoid- 
ance training than either of the other 
experimental conditions. The latter 
two did not differ from one another in 
transfer effect. The fact that the ex- 
perimental groups ac a whole gave 
more avoidance Ks ‘e test situa- 
tion than did the -ontrols indicates 
that “backward” prcco ditioning ex- 
Yet does such a temporal rela- 
tionship between CS-UCS exist for 
classical conditioning? Without con- 
sidering this PC phenomenon neces- 
sarily analogous to what has been 
called ‘‘backward”’ conditioning, at 
this point one might raise the ques- 
tion of a possible difference between 
the two in temporal parameters. 
(Coppock’s S-R analysis [1958] to be 
covered later presents a forward con- 
ditioning interpretation of such an 
occurrence in SPC.) 

Finally, a recent study by the au- 
thor (1958) extended Bahrick’s find- 
ings regarding the role of specific re- 
sponses as possible mediators in 
SPC. Hooded rats were exposed to 
the PC stimuli when food-deprived, 
and later were split into hungry, 
thirsty, and satiated groups during 
avoidance learning (and _ transfer). 
All three experimental groups showed 
the same degree of positive transfer 
when compared to the control group 


ists. 
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(initially exposed only to the test 
stimulus). The reader will note that 
the experimental animals showed this 
equivalent effect despite differences 
in degree of similarity between the 
autonomic response-complex present 
during the preconditioning period 
and that present during the training- 
testing phases. It seems also possible 
then, that in SPC, unlike condition- 
ing, the role of the response is an un- 
important one. 

Human studies. Karn’s study 
(1947) most resembles Brogden’s orig- 
inal design. This author used finger- 
flexion avoidance training. The 12 
experimental Ss (college students) 
received 50 simultaneous presenta- 
tions of buzzer and light; all 24 Ss 
were then trained to criterion to 
avoid shock by responding to the 
buzzer; and finally, all were given 10 
unreinforced trials to the light. The 
control group had no _ pretraining. 
The results agree with Brogden’s 
data, but suffer from the same flaw, 
unequal exposure to the test stimulus 
(favoring the Experimental Ss). 

Brogden’s 1942 study incorporat- 
ing the GSR as the CR met this lack 
and the results turned out negative. 
But the outcome was attributed by 
the author to lack of a reliable meas- 
ure of conditioning; and hence, the 
experiment was not a valid test of 
ore. 


In the rest of his experiments, 
which were somewhat more success- 
ful, Brogden (1947, 1950; Brogden & 


Gregg, 1951; Chernikoff & Brogden, 
1949) also controlled for possible dif- 
fereiitial effects. 

In one study where Brogden (1947) 
utilized reaction time measure in- 
stead of GSR, he was successful in 
obtaining the SPC effect. The train- 
ing, transfer, and extinction test pro- 
cedures were: 30 trials to light; 10 
trials to tone; 10 extinction trials to 
light. Included were three control 
groups: (a) Given no pretraining 


(preconditioning) period. This con- 
dition provided a test for sensory 
generalization (based upon unequal 
exposure to the test stimulus). (0) 
Given exposure to the test stimulus 
alone equal to that of the experi- 
mental groups. This was the usual 
SPC control condition. (c) Given no 
pretraining and no transfer test to the 
tone. This group acted as the SPC 
control condition for the extinction 
test of reaction time to the light. All 
Ss were told to respond to the light 
only and they would be shocked if 
they were too slow. Actually, no 
shock was given. The instructions 
were given after S had been told to be 
seated and E ‘‘accidentally”’ had pre- 
sented the preconditioning stimuli 
while ‘‘fixing’’ the apparatus. 

The transfer test was successful 
in showing SPC. In this test Control 
Groups (a) and (6) did not differ 
from one another even though (8) had 
the advantage of sensory generaliza- 
tion. The extinction test was not 
successful. It was based upon the as- 
sumption that the unreinforced (no 
shock) tone presentations should 
have extinguished the shock-expec- 
tancy to the greatest degree for the 
experimental group, next for Control 
Groups (a) and (0), and least for 
Control Group (c). The first three 
groups showed similar marked ex- 
tinction (latency increase) to the 
light while Group (c) showed none. 
Apparently, the 10 unreinforced tone 
trials, regardless of prior associations, 
were sufficient to extinguish the ex- 
pectancy. 

Chernikoff and Brogden (1949) 
repeated the experiment using elec- 
tronic equipment, and a diffuse tone 
source, all of which seemed to increase 
the efficiency of the experiment since 
positive transfer results were ob- 
tained with only 10 Ss per group 
whereas Brogden had used 42. Also, 
the percentage: of Ss responding in 
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the test series was twice that of the 
other study. Another variable in 
this experiment was_ instructional 
variation. One experimental group 
was given the same instructions as in 
the previous study, while two others 
were told ‘‘not to respond”’ or to “‘do 
what seems natural” in the test situa- 
tion. Only the group given the old in- 
structions evidenced a_ significant 
difference from the controls. 

In another experiment (1950) 
Brogden utilized a diffuse source for 
both tone and light in precondition- 
ing. This time he failed to get suc- 
cessful results with the usual meas- 
ures. By adding the procedure of 
measuring absolute auditory thresh- 
olds to the preconditioning tone at 
the end of the experimental sessions, 
he obtained positive results. He 
found that the presence of the light 
with the tone led to greater ‘‘lower- 
ing’ of the auditory threshold for 
the experimental group than for the 
controls. The ‘‘lowering’’ is put in 
quotes since again the data are sub- 
ject to contamination by the S’s pos- 
sible deliberate pressing of the key 
even when he was in doubt about 
hearing the tone. This is quite pos- 
sible since: (a) he was instructed to 
respond to tone if present even when 
in doubt; and (6) he had previously 
experienced the light and tone simul- 
taneously. This again points up the 
need for an involuntary response. 
With regard to the second point, it 
has long been known that facilitation 
takes place when one of these stim- 
uli is supplemented by the other 
(Child & Wendt, 1938). It is possible 
that in this study the difference in the 
facilitation effect (lowering of the 
tonal threshold) between control and 
experimental groups was a result of 
the excess paired presentations of 
tone and light given the experimental 
group. 

Brogden 


and Gregg (1951) re- 


peated the threshold procedure in 
six experiments with variations on: 
(a) sequence of threshold trials (with 
and without light), (6) number of pre- 
conditioning pairings, (c) steps in ob- 
taining threshold, and (d) illumina- 
tion (increase or decrease). No sig- 
nificant ¢ ratios were obtained for the 
above variations, but the experi- 
mental group as a whole showed the 
same results as the earlier study. In 
recalling the proposed relationship 
between sensory preconditioning and 
classical conditioning, it should be 
noted that the strength of a CR isa 
function of the number of reinforced 
trials. Although the exact value 
of this finding cannot be determined 
since his data are confounded with a 
possible facilitation effect, Brogden’s 
data indicate that no such relation- 
ship between frequency of exposure 
and strength exists in 
SPC. Brogden also summarized two 
unpublished studies which support 
the above positive findings. 


association 


A recent attempt at sensory pre- 


conditioning with human subjects 
was made by Bitterman, Reed, and 
Kubala (1953), who wanted to show 
that SPC produces as stable an effect 
as does classical conditioning. Their 
rationale rests on a sensory integra- 
tion approach to preconditioning, 
and they hoped to indicate that a 
Hullian S-R interpretation could not 
predict the same results. The S-R 
argument presumed is that the need 
reduction following PCS (precondi- 
tii ‘ ‘g stimuli) would be less than 
th.  ollowing the CS; consequently, 
the sErR would be weaker following a 
given number of sensory precondi- 
tioning trials than for the same num- 
ber of conditioning trials. 

While their results show no differ- 
ence in response to extinction be- 
tween PCS and CS, their data should 
be evaluated cautiously. First, a dif- 
ficulty in interpreting their findings 
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stems from the fact that the two 
PCSs were lights differing in position 
on a panel. Technically, then, this 
procedure deviated from the usual 
one since no intersensory relation 
was attempted. Moreover, the re- 
sponse measured was the GSR, 
something which requires extreme 
care when conducting any sort of 
conditioning experiment. Seated in a 
semidarkened room, the precondi- 
tioning group was presented with the 
training and generalization stimuli, 
sometimes with the CS alone and 
sometimes paired with the PCS, so 
that the termination of one coin- 
cided with the onset of the other. 
The conditioning group, on the other 
hand, was presented with each stimu- 
lus on separate trials for the same 
total number of pretraining trials. 
Extinction for all Ss to both the CS 
and generalization stimulus (the 
other PC stimulus for the PC group) 
followed training session. 

Following upon the use of the 
GSR and this procedure, two possi- 
ble weaknesses seem to discount 
these results. To begin with, a GSR 
is elicited by a wide variety of stim- 
uli, and was most probably present 
during the pretraining period; hence, 
the preconditioning paradigm was 
not followed. In order for a valid 
procedure to have been used, it would 
have been necessary first to extin- 
guish the GSR to the experimental 
stimuli. Secondly, since a GSR prob- 
ably did occur to light itself, it is 
quite possible that a summation ef- 
fect occurred in the preconditioning 
group. This would mean that a 
greater GSR could have been elic- 
ited to both lights in the precondi- 
tioning period than for each light 
presented separately in pretraining 
(the condition for the control group). 
Thus, contrary to the implication of 
Bitterman, et al., that the stimuli 


were initially neutral, it seems prob- 
able that they were not. Further, 
the authors asserted that stimulus 
generalization could not explain the 
results. However, if the GSR and 
summation did occur in the manner 
outlined above, then the precondi- 
tioning group should have been con- 
ditioned to a greater degree to light 
stimuli than the conditioning group. 
Consequently, it is quite possible 
that greater stimulus generalization 
could account for the fact that the so- 
called preconditioning group showed 
greater generalization in extinction 
(one measure of preconditioning) 
than did the conditioning group. Un- 
fortunately, there was no measure of 
GSR reported for the pretraining 
period for either groups so that, al- 
though highly probable, the evalua- 
tion requires additional data for sub- 
stantiation. 

The latest SPC study (Coppock, 
1958) to appear was concerned with 
‘“‘pré-extinction”’ and involved the 
use of GSR in classical conditioning 
(shock as UCS). Although the data 
shed some light on the meaning of 
the foregoing experiment, the results 
raise questions related to the inter- 
pretation of GSR in SPC as a medi- 
ating response for S—R theory. 

The experiment consisted of four 
experimental groups and a control 
group. The latter was exposed to the 
PC stimuli (light and tone) sepa- 
rately on randomly alternated trials. 
Two of the other groups, PC and 
IPC, were arialogous to the forward 
and ‘“‘backward’”’ PC groups in Silver 
and Meyer’s study. The interstimu- 
lus interval in the present study, 
however, was 1 sec. as compared to .5 
sec. in the other; and Coppock 
referred to the inverted stimulus 
presentation jas IPC (inverted PC) 
rather than “backward.” 

The two pre-extinction groups 








(PE) were treated like PC initially. 
Then one group, IPE, was immedi- 
ately given an equal number of in- 
verted exposures of the PC stimuli 
(like IPC). The other group, SPE, 
was presented with the first stimulus 
alone (analogous to unreinforced 
CS). Coppock’s predictions based 
upon an S-R mediation analysis were: 
(a2) IPE>SPE, (b) SPE>C, de- 
pendent upon success of pre-extinc- 
tion, (c) IPC>C, and (d) PC>C. 
His S-R analysis of SPC will be dis- 
cussed later in the Theory Section. 
The results did not completely 
confirm the proposed S-R hypotheses. 
The nonparametric comparisons 
made by Coppock revealed (a) PC 
>C, (b) SPE>C, (c) IPE>SPE, 
but (d) IPC did not differ signifi- 
cantly from C. Since the experi- 
mental effects were found to be inde- 
pendent of GSR reactivity, per se, 
and of procedural variables, it is not 
clear why the IPC group should have 
shown _ training-extinction — effects 
while the SPE Ss exhibited no pre- 
extinction effects. Unfortunately, 
certain additional statistical compari- 
sons were not made which would 
have provided a basis for more thor- 
ough theoretical evaluation (viz., 
S-S vs. S-R) in the pre-extinction 
setting. For example, it is apparent 
from the graph of the transfer data 
that IPE was equal to and possibly 
superior to PC. Also, SPE seemed 
equal to PC. Certainly if “‘extinc- 
tion”’ is to be meaningfully applied in 
this experiment, whether from S-S 
or S-R viewpoints, equality of SPE 
and PC presents theoretical difficul- 
ties. In addition, unless it is assumed 
that the SPC association was maxi- 
mum in the PC group, S-S theory 
should have predicted IPE> PC since 
the IPE group had twice as many 
pairings of PC stimuli. Finally, ac- 
cording to S-S theory IPC should 
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have done almost as well as the PC 
group. (Reversal of S-S appearance 
could have weakened the expectancy 
slightly.) 

It was stated at the outset of this 
analysis that Coppock’s study (1958) 
shed some light on the ambiguities 
in the GSR experiment of Bitter- 
man, et al. (1953). As noted earlier in 
the discussion, Coppock found equiv- 
alence in GSR reactivity among all 
groups at various stages of the experi- 
ment and that the treatment effects 
were independent of GSR magni- 
tude, per se. While one cannot neces- 
sarily infer between experiments in 
this regard, such a finding does lend 
somewhat more credence to Bitter- 
man’s findings. 

As one final point, it should be 
noted that in Coppock’s experiment 
the GSR did not follow the custom- 
ary S-R curve of extinction. Here 
again, as in Seidel’s experiment 
(1958), the role of a specific response 
in SPC seemed irrelevant to the de- 
gree of association of the PC stimuli. 

Apart from the theoretical prob- 
lems, Coppock’s IPC group (1 sec. 
interstimulus interval) yielded data 
contrary to those previously ob- 
tained in a comparable condition, 
(cf. p. 59). Silver and Meyer’s “ Back- 
ward” PC group (.5 sec. stimulus 
interval) showed mediation equal to 
their simultaneous PC group (1954). 
What basis may exist for the dis- 
crepancy may only be speculated 
upon at this point: voluntary (avoid- 
ance) vs. involuntary (GSR) re- 
sponse, rats vs. humans, difference in 
temporal intervals (.5 vs 1 sec.). 
Clearly, this problem needs further 
investigation. 

One experiment (Wickens & Briggs, 
1951) used an identifying response 
during exposure to preconditioning 
stimuli. This one presents a strong 
case to show that SPC is merely an 
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instance of ‘‘mediated stimulus gen- 
eralization’’ (MSG), and that con- 
tiguity of the PC stimuli is unneces- 
sary to obtain the desired transfer 
effect. One group of college students 
was exposed to 15 contiguous presen- 
tations of tone and light, while an- 
other group was given 15 separate 
presentations of tone and 15 of light 
in random order. During the PC pe- 
riod the Ss were asked to give a 
verbal recognition response (‘‘Now’’) 
to the stimuli. Both groups showed 
the same significant advantage in 
transfer of an avoidance R over the 
control groups which had given the 
verbal response to a tone 15 times 
or to a light 15 times. 

On the surface the hypothesis 
seems to have been substantiated, 
but at least two points should be ex- 
amined before the conclusion is ac- 
cepted. If the identifying response is 
considered as instrumental in kind, 
then it follows obviously that the 
above-noted transfer effect stands as 
an example of S—R learning. How- 
ever, the generalization that the con- 
cept SPC is in like manner an aspect 
of S-R learning (via mediated stimu- 
lus generalization), although sug- 
gested by, does not necessarily follow 
from a single experimental outcome. 
In fact, if one assumes that the identi- 
fying response acted to “‘set’’ the Ss 
to connect the two stimuli, one would 
expect the obtained transfer differ- 
ences to occur. Stated in another 
way, the Wickens and Briggs study 
showed that the mediating response 
is a sufficient condition in SPC. In 
order to show that the response is 
both a necessary as well as a sufficient 
condition to effect mediation, it 
would be essential to eliminate non- 
response induced ‘‘sets’’ as possible 
mediators. An example of the latter 
would be the increase in pronounced- 
ness of the PC stimuli by delimiting 
the quality and quantity of other 


stimuli available to S (enhancing at- 
tention value, per se). Further, to 
compare MS and SPC directly, the 
above study should have included 
two experimental groups (pure SPC) 
exposed to the two stimuli minus the 
identifying response. Experimenta- 
tion noted earlier pertinent to the 
role of possible responses during the 
latency period of SPC and MSG will 
be discussed ‘further subsequently. 

One other related set of experi- 
ments concerns mediate association 
and requires consideration as a possi- 
ble verbal S~R analogue to mediated 
stimulus gen@ralization. The general 
principle of .mediate association re- 
quires that previous associations be- 
tween two ideas will facilitate the 
establishment of one of these with a 
third hitherto unrelated idea. This 
concept took the experimental form 
of learning paired associates. Peters 
(1935) used various pairs of meaning- 
ful and nonmeaningful verbal and 
motor tasks:to investigate the con- 
cept. The sequences of associations 
frequently involved using the re- 
sponse as the common item (A-B, 
C-—B, A-C).: Once the stimulus was 
the common element (A-B, A-C, 
B-C) and once the response in the 
first pairing became the stimulus in 
the second (A-B, B—C, A-C). In no 
instance did the ¢ test show the ex- 
pected facilitating effect. The only 
procedure which even approached 
significance in this direction was the 
last one noted above, A-B, B-C, 
A-C, in which Peters used months 
(B), numbers from 1-12 (A), and let- 
ters (C). 

A more recent experiment by 
Bugelski and Scharlock (1952), with 
the latter procedure and nonsense 
syllables as the learning material, 
produced positive results. Important 
to note is that here the ¢ test again 
failed to yield significance. How- 
ever, the order effect in the expected 
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direction was significant. Unfortu- 
nately, the individual results were not 
available in Peters’ article so that the 
order effect could not be tested in his 
data. Nevertheless, if one adopts 
tentatively the suggestion offered by 
Bugelski and Scharlock that the 
order of association is important, the 
phenomenon fits neatly into the 
classical mediating generalization 
framework. The sequence A-B, 
B-C, A-C would be expected to 
yield facilitation of A-C, whereas 
A-B, C-B would not and could be 
considered similar to backward con- 
ditioning.* The SPC data, although 
by no means clearcut, suggest that 
temporal ordering of PC stimuli (and 
thus also of the unobserved re- 
sponses) may not be important. Both 
the IPE and IPC groups in Cop- 
pock’s study (1958) and the ‘‘Back- 
ward”’ PC group in Silver and Mey- 
er’s experiment (1954) could be con- 
sidered the analogue in SPC to the 
inverted C-—B condition. The IPE 
and Backward PC groups showed the 
SPC effect while the IPC group did 
not. Although not conclusive, these 
results provide an indication that 
sensory preconditioning may not be 
simply an instance of S—R learning 
as is mediated generalization. Cer- 
tainly, a more detailed comparison 
of the temporal parameters governing 
the instances of mediated S—R learn- 
ing and SPC is needed. 

Before going on to theoretical im- 
plications, it would be well to sum- 
marize the empirical findings. At 
this point, SPC seems generally sub- 
stantiated as a phenomenon in learn- 
ing. Further, there are indications 
that the required conditions for its 


3 Razran (1956) to the contrary, notwith- 
standing, recent experimental literature indi- 
cates that so called backward conditioning is 
either an artifact of conditioning procedures 
(Harris, 1941) or an unstable, weak, transient 
effect (Spooner & Kellog, 1947). 


occurrence seem to be little more 
than repeated stimulus contiguity. 
The above analysis hints at a lack of 
importance in temporal relationship 
between PC stimuli. Brogden’s data 
(Brogden & Gregg, 1951) suggest 
that number of repetitions (i.e., 
analogous to Hull’s N) do not oper- 
ate in SPC as in S-R learning. Sim- 
ilarly, Coppock’s study (1958) sug- 
gests that extinction in SPC does not 
follow the usual curve related to num- 
ber of unreinforced CS repetitions. 
In addition, Coppock’s results, those 
of Bahrick (1952, 1953) and of the 
author (1958) reveal that the ex- 
istence of a response during the PC 
period is unimportant for SPC to oc- 
cur. These facts must be taken into 
account when one attempts to class 
sensory preconditioning as an in- 
stance of a given conceptualization 
of learning (i.e., S-S or S—R theory). 


THEORETICAL INTERPRETATIONS 


Most of the experimenters have 
not attempted to theorize about the 
nature of sensory preconditioning 
with the exception of Brogden, how- 
ever, who interpreted his results in 
terms of Guthrian theory; and he hy- 
pothesized an unknown UCR and CR 
to the neutral stimuli. 

Wickens and Briggs (1951) and 
Silver and Meyer (1954), in agree- 
ment on the apparent lack of rein- 
forcement in the SPC situation, have 
attempted an S-R analysis of the 
learning process in terms of ‘‘mediat- 
ing stimulus generalization.”” Ac- 
cording to Silver and Meyer, the 
buzzer and light are actually uncon- 
ditioned stimuli which lead to “not 
directly observed” unconditioned re- 
sponses. After frequent pairing, each 
of these stimuli comes to elicit equally 
difficult-to-observe conditioned re- 
sponses. The resultant in transfer 
from this initial cross-conditioning, 
the reader will recall, is that in test 
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trials one should 
transfer effect. 
Coppock’s S-R analysis (1958) 
differs slightly from the cross-condi- 
tioning approach in that he assumed 
that in accord with conditioning 
principles the temporal relationships 
between S, and S, during PC deter- 
mine which response-complex could 
provide the response-produced stim- 
ulus as a basis for mediation. This 
means, as shown in Fig. 1, that the 
S-R mediation process differs be- 
tween Coppock’s PC and IPC groups. 
As pointed out by Coppock (1958, 
p. 218) the IPC mediator existent 
during training was the response- 
produced stimulus of a CR (Ris) 
which was undergoing extinction dur- 
ing that stage. On the other hand, 
the usual PC group has a UCR as 
a base for the response-produced 
mediator (e.g., Res:). Note that, asa 
result of his traditional S-R analysis. 
Coppock labeled the CR-mediation 
group inverted PC rather than “back- 
ward”’ PC as did Silver and Meyer. 
His analysis has the advantage in 
that it is less ambiguous to predict 
from the UCR-CR distinction than 
from the cross-conditioning analysis 
that PC>IPC. Further, Coppock 
could predict that the IPE group, 


expect positive 


benefiting from the added CR-medi- 
ation after extinction of PC, should 
show greater transfer than a group 
having undergone simple extinction 
of preconditioning connections (SPE). 
On the other hand, neither approach 
is adequate to account for SPC data 
that reveal preconditioning tnde- 
pendent of the response during the PC 
period (Bahrick, 1953; Coppock, 
1958; Seidel, 1958). 

In formulating his mediation analy- 
sis of SPC, Osgood also abandons 
the concept of reinforcement as a 
necessary condition for learning. In 
fact, after raising the fact of no ex- 
tinction after many secondarily (at 
most) reinforced trials of mere bom- 
bardment by stimuli, he concludes 
that sensory preconditioning pro- 
vides ‘‘one of the strongest arguments 
against reinforcement theory” (1953, 
p. 462). 

Osgood’s S-R explanation differs 
from the above, however, since he 
suggests that ‘“‘a common perceptual 
reaction” (e.g., attentional) is elic- 
ited initially to the novel stimuli. 
“If one of these... is now... con- 
ditioned to a new reaction, the seif- 
stimulation produced by the media- 
tion process .... ’’ isinferred (p. 461). 
The obvious difficulty with his in- 
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terpretation, to agree with Osgood 
himself, is that nothing of the pro- 
posed process is directly apparent in 
the organism’s behavior. Osgood is 
forced to draw upon analogous evi- 
dence from conditioning (e.g., Ship- 
ley, 1933) to substantiate his point. 
However, as noted earlier whether or 
not these analogies are correct awaits 
further experimentation in the areas 
of SPC and MSG to establish as fact 
the existence of similar stimulus and 
response relationships in both types 
of procedure. 

As implied by the discussion in the 
previous section, there appears to be 
some inconsistency between any S-R 
analysis of SPC and the analogous 
evidence cited as support for media- 
tion. With reference to Osgood, al- 
though in all of his discussion of the 
mediation process he states that it is 
some fraction of previous instru- 
mental behavior, it is difficult to see 
how such a conceptualization could 
apply to SPC. No instrumental re- 
sponse is called for in this paradigm, 
nor is any one differentially rein- 
forced if it does occur. Further, from 
data discussed, when autonomic re- 
sponses are made consistent or recur- 
rent with the PC pairing, they appar- 
ently have no influence on the associ- 
ation of the two stimuli. If it is as- 
serted that the autonomic responses 
are not important for mediation but 
the unobserved UCR’s are, one 
simply begs the question. Why is one 
inferred response and not the other 
the existence of which has more cer- 
tainty (i.e. through food deprivation 
or direct GSR measurement)—im- 
portant for mediation? Furthermore, 
it would be inconsistent or at least 
theoretically not parsimonious for 
Osgood to accept autonomic media- 
tion in one learning instance and to 
deny it in a second situation where it 
has an equal possibility to mediate 
transfer. 











An S-S contiguity point of view is 
proposed by Birch and Bitterman 
(1949) who feel, ‘‘The results of the 
sensory pre-conditioning experiments 
require us to postulate a process of af- 
ferent modification (sensory integra- 
tion)... which takes place inde- 
pendently of need reduction” (p. 
302). They later assert that ‘‘the 
latent learning experiment may be 
understood as a complication of the 
sensory pre-conditioning  experi- 
ment...’ (1951, p. 360). For the 
essential condition of sensory integra- 
tion they postulate, ‘‘When two af- 
ferent centers are continuously acti- 
vated, a functional relation is estab- 
lished between them such that the 
subsequent innervation of one will 
arouse the other’ (p. 358). 

To this writer, the key phrases in 
the above seem to be “functional” 
and “‘such that”’ since in these words 
lies the linkage between the mediated 
S-R and the afferent integrations. 
These verbal ambiguities lead to the 
ultimate conclusion that the diffi- 
culty in deciding upon the correct 
functional explanation for the medi- 
ating process resolves itself into a 
pseudo-problem for psychology. Per- 
haps neurologists will some day pro- 
vide the answer concerning whether 
or not the central connections are 
between afferent-efferent or afferent- 
afferent neurons. As a start in this 
direction, Harris (1948) has hy- 
pothesized that a type of neural sum- 
mation occurs when _ intersensory 
stimuli (e.g., visual and auditory) 
are paired. His review of the physio- 
logical evidence led to the hypothesis 
that there is high probability of such 
summation taking place in the mid- 
brain and brain stem. The inter- 
sensory facilitation noted in psycho- 
logical studies could then be ac- 
counted for as the behavioral cor- 
relate of this neural integration. 
Further, through some fractionated 
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intermediary response, common ini- 
tially to both sound and light, sensory 
preconditioning is supposed to occur. 
In this way, Harris attempts to pro- 
vide justification for a neural locus 
of an attentional or perceptual medi- 
ator (similar apparently to that of 
Osgood). The physiological data re- 
viewed by him, however, seem to pro- 
vide an equally plausible basis for an 
S-S or S-R psychology. 

There is one different type of me- 
diated-response hypothesis which de- 
serves mention. Hebb (1949) has 
proposed a neural associationistic 
theory which includes the develop- 
ment of alternate neural routes in 
the CNS as a correlate of perceptual 
learning. The response which he 
gives as an example of a mediator in 
the formation of a visual percept is 
the scanning eye movement from 
angle to angle along the sides outlin- 
ing a visually presented object. 
Clearly, performing any type of in- 
strumental response can be differen- 
tiated from the mediating response 
in Hebb’s theory. The eye move- 
ment can occur independently of 
what instrumental response the S 
must perform in any given task. In 
this sense, such an independent me- 
diator is also different from Osgood’s 
“detached 


responses’ which are 
stated to be in some measure part of 
previous instrumental — behavior. 


Consequently, it seems plausible to 
suggest that, if any type of mediating 
response takes place in sensory pre- 
conditioning, it may be of the Hebb- 
ian variety rather than the usual 
instrumental type seen in mediated 
generalization and mediate associa- 
tion. The rationale for such a pro- 
posal should become more apparent 
in the following section. 


A Comparison of Mediated Generaliza- 
tion and Sensory Preconditioning 


At this point it might be well to 


note more clearly the rationale which, 
it is felt, forces a cautious approach 
upon any attempt at relating these 
concepts. As was mentioned in the in- 
troduction, one characteristic unique 
to the SPC learning paradigm is 
the lack of any response require- 
ment during the latency or critical 
period. Other procedures which have 
been used (i.e., place vs. response, 
latent learning) to test the relative 
merits of S-S and S-R theories all 
require, and sometimes reward, spe- 
cific responses in such a stage. As is 
evident from the Wickens and Briggs 
experiment discussed above, this dis- 
tinction is not readily apparent in 
the analysis of the paradigm for 
mediated stimulus generalization. In- 
deed, in order to better understand 
both SPC and MSG, a step by step 
procedural comparison should be 
helpful. 

Consider specifically the break- 
down in Table 1. To illustrate the 
comparison, reference is made to 
Shipley’s study (1933) on mediated 
stimulus generalization, which Os- 
good (1953) cites as a classic example 
of both MSG and SPC. In order to 
make the comparison more applica- 
ble to S-R learning in general, an 
outline of the Wickens and Briggs 
study (1951) was included in the 
chart. Like SPC the experiments in- 
volved three stages, but the structure 
of these stages seem to be observably 
different from SPC. Shipley first 
paired a CS, (faint light) with a 
definite UCS, (tap-on-cheek) to con- 
dition a CR, (eyeblink). The Wick- 
ens and Briggs procedure differed 
somewhat from that of Shipley by 
utilizing instrumental learning dur- 
ing the first stage. The Ss were re- 


quired to give a common response 
(“‘Now’’) to the paired CSs (light- 
tone presentation) or to either CS 
While these are easily 

straightforward con- 


separately. 
identified as 
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ditioning or instrumental learning 
procedures respectively, such a de- 
scription of the corresponding SPC 
stage seen in Table 1 seems inac- 
curate. During the preconditioning 
period, Ss are simply exposed to con- 
tiguous stimuli such as buzzer and 
light, heretofore in conditioning 
studies presumed to be neutral stim- 
uli (e.g., NS; and NS:). Note that no 
instrumental or manipulatory or un- 
conditioned response is required or 
imposed on the subject. What 
more, if any response is made, unlike 
MSG, no recognition of it by the ex- 
perimenter is given through reward 
or punishment. 

In the second stage of his experi- 
ment, Shipley conditioned CR, (fin- 
ger withdrawal) to CS, previously 
UCS, (tap-on-cheek). Wickens and 
Briggs followed a similar procedure. 
At this stage in SPC one PC stimulus 
is used as CS, in a similar conditién- 
ing or instrumental learning pro- 
cedure. Next, Shipley presented the 
faint light without further condition- 


1S 


TAB 


COMPARISON OF PROCEL 


MSG 
Stage = ——-} bean 

S R 

1 (Shipley) CS,-UCS, CR, 

(Wickens and CS,-CS2 CR, 

Briggs) CS, Coe CR, 

2 (Shipley) CS, CRe 
(Wickens and 

Briggs) CS, CR: 

3 (Shipley) CS, CR: 

mediated 
(Wickens and CS. CR: 


mediated 


Briggs) 
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ing and this stimulus, CS,, elicited 
CRo, finger withdrawal, in some Ss. 
Wickens and Briggs obtained positive 
transfer effect in both their experi- 
mental groups (separate or contigu- 
ous presentation of stimuli). In SPC 
the final stage is a similar transfer 
test wherein NS is used to elicit CR. 

Note that in Shipley’s study, the 
Wickens and Briggs’ investigation 
and in MSG experiments in general 
some specified response and condi- 
tioning is imposed initially. The at- 
tempt at generalization to SPC of the 
same type of mediation process rests 
upon the assumption that some unob- 
served or unobservable UCR occurs 
to both NSs. Consequently, as noted 
earlier, Silver and Meyer (1954) and 
similarly Wickens and Briggs (1951) 
hypothesize that any SPC effect is ex- 
plained as a mediated resultant of a 
type of cross-conditioning between 
NS, and NS: established in the pre- 
conditioning period. In the training 
period although only NS, is used as 
CS;, an entire stimulus complex is 


LES 
URES IN MSG anp SPC 
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presumed present composed of NS, 
its stimuli derived from its UCR, and 
those from its CR (UCR to NS). 
And, since these stimuli from its CR 
are similar to those produced by the 
UCR of NS, in test trials one should 
expect positive transfer effect. 

It remains an empirical question, 
however, concerning: (a) whether or 
not SPC and MSG are operationally 
distinguishable concepts, and (bd) 
whether or not either one or both can 
be subsumed under the principles of 
conditioning. With regard to the 
latter point, there are at least the 
three sources of published data dis- 
cussed previously which appear op- 
posed to a conditioning interpreta- 
tion of SPC. First, there is the find- 
ing that stable backward SPC ex- 
ists—to as great a degree as simul- 
taneous SPC; and, secondly, Brog- 
den (1951) has reported that appar- 
ently in SPC the strength of the pre- 
conditioning association is not a 


function of N (number of PC pair- 


ings). 

Thirdly, the role of the response in 
SPC seems unimportant. Although 
Bahrick’s results noted earlier were 
not definitive for SPC, he did obtain 
a positive transfer effect in his test 
situation for all groups. This gen- 
eralization occurred despite the fact 
that the rats were exposed to PC 
stimuli under hunger and _ thirst 
motivation, but trained and tested 
on an avoidance problem when sati- 
ated for hunger and thirst. Bahrick’s 
data, as a result, suggest at least two 
difficulties for an S—R interpretation 
of mediation in SPC. The autonomic 
responses present during exposure, 
which might have mediated the 
transfer, were either nonexistent or 
present in only a slight degree during 
training and testing. Even more 
striking, is the fact that the auto- 
nomic processes dominant during 
training and testing (sympathetic 


processes) were opposed to those 
present during the initial pairing of 
the PC stimuli. Despite both condi- 
tions positive transfer occurred. Fur- 
thermore, in the SPC study (Seidel, 
1958) cited earlier, the writer sub- 
stantiated Bahrick’s finding in a de- 
sign which included degrees of sim- 
ilarity between autonomic responses 
present during preconditioning (the 
exposure period) and those present 
during the training-testing stages. 
As was pointed out in the analysis of 
that experiment, the most probable 
mediating responses must have been 
either ,the autonomic response-com- 
plex, per se, or that complex com- 
bined with other unobserved re- 
sponses. In either case, differences 
among experimental groups should 
have appeared if response-produced 
mediation were involved. Appar- 
ently, these two experiments indicate 
that even when a given response is 
specifically made consistent with the 
PC stimuli and thereby allowed the 
opportunity of serving as the basis 
for mediation, it has no effect in the 
SPC paradigm. In addition, the find- 
ing that the GSR in SPC does not 
seem to follow the normal extinction 
curve argues against an S-R inter- 
pretation of SPC. 

The caution advised in attempting 
to subsume SPC under S-R learning 
theory by calling it an example of 
conditioning seems clearly justified. 
From the above data it appears that 
number of repetitions (i.e., NW), tem- 
poral order, and specific responses 
have little effect on the establishment 
of stimuli association in SPC. On 
the other hand, the importance of 
these factors in conditioning is well- 
established empirically. 

Still to be considered is the ques- 
tion (a) whether or not SPC and MSG 
are operationally distinguishable con- 
cepts. Returning to the analysis of 
Shipley’s MSG study, it will be re- 
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called that in Stage 1 a faint light 
(CS,) was conditioned to elicit an 
eyeblink (CR,). Since at this point 
the link between light and tap-on- 
cheek (UCS,) was established, pre- 
sumably conditioning provided the 
basis for mediation. In like manner 
Wickens and Briggs established S-R 
connections initially. Ostensibly at 
least the SPC operations (Table 1) 
do not establish such links. In addi- 
tion, since SPC data appear in con- 
tradiction to certain conditioning 
principles, it is implied from the fore- 
going discussion that although MSG 
would fit S-R theory, it should not 
vield data consistent with SPC. 
There is an indirect suggestion of 
such a possibility if one considers the 
studies (Bugelski & Scharlock, 1952; 
Peters, 1935) on mediate association 
(A-B, B-C, A-C) as a verbal parallel 
to the MSG paradigm. The data 


gathered so far from these studies in- 
dicate that a certain order of presen- 
tation of S and R in each stage is es- 


sential to the attainment of facilita- 
tion effects (mediate association). in 
the third stage. If it is recalled from 
Table 1 that MSG involves condi- 
tioning as the basis for mediation, it 
is apparent that a similar type of 
order (the CS-UCS order) should be 
of prime importance in the achieve- 
ment of mediated generalization. In 
fact, while this order principle seems 
to govern both mediate association 
and mediated stimulus generaliza- 
tion, as noted above, it apparently 
does not hold for SPC. A feature 
unique to the preconditioning pro- 
cedure which seems related to this 
difference is the previously men- 
tioned absence of any required instru- 
mental or conditioned response dur- 
ing the preconditioning period. In 
addition to the possible difference in 
temporal parameter governing SPC 
and MSG, other SPC findings dis- 
cussed offer the suggestion that N 


and mediating-response factors are 
not the same either. 

What this over-all comparison of 
MSG and SPC indicates is that, al- 
though both paradigms yield similar 
transfer effects in some instances, 
SPC alone appears governed by a 
different set of laws from that of 
classical conditioning. It is empha- 
sized that this is a tentative working 
hypothesis suggested by both partial 
and indirect sources of data. Whether 
or not S-R concepts are able to ac- 
count for SPC and whether MSG 
and SPC are actually two names for a 
single learning process or reflect dif- 
ferent types of learning await a sys- 
tematic parametric comparison be- 
tween the two concepts. Further- 
more, if learning is a two-stage proc- 
ess as Mowrer has already suggested, 
it may be that such a comparison 
could yield the parameters for these 
factors. At any rate, in the most 
conservative sense, one might simply 
state that the SPC studies have 
given results different from those 
previously gotten in conditioning or 
those implied by any S-R media- 
tional learning hypothesis. 


CONCLUSIONS 


If SPC is to be explained by the 
same principles as classical condi- 
tioning, as Reid has suggested (1952) 
in addition to following the laws of 
conditioning, sensory precondition- 
ing should be present in an organism 
simple enough to make symbolic 
functioning an untenable interpreta- 
tion. From the available compara- 
tive literature reviewed by this 
writer, SPC does seem to exist in 
such organisms. 

It is noteworthy that Bahrick and 
Reid used the same apparatus for 
preconditioning and training, and 
these authors found that presenting 
the control group with only the test 
stimulus in preconditioning resulted 
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in the same degree of transfer for ex- 
perimental and control groups. The 
writer, as well as Silver and Meyer, 
on the other hand, utilized two dis- 
tinctly different pieces of apparatus 
for these conditions; and they ob- 
tained significantly different degrees 
of transfer between experimental 
and control animals. The apparent 
paradox seems obviated if one recog- 
nizes that in the preconditioning 
setting the contiguous sensory stim- 
uli are not limited to those which the 
experimenter has designated. Rather, 
the sensory associations are formed 
among the particular situational cues 
to which the animal attends. These 
sensory associations, thus, consti- 
tute a stimulus complex, in which 
the tone and/or light represent but 
one or two components of the total- 
ity. Thus, it is proposed that the 
important stimuli for the organism 
in the preconditioning situation are 
constituted by the stimulus complex 
to which it attends; and all that the 
experimenter can hope to do is to 
heighten the probability that the 
stimuli in which he is interested will 
be included in the complex of interest 
to the rat. Consequently, there ex- 
ists the need for additional animal 
studies with control of the exposure 
variable and apparatus similarity, 
both of which bear on the subject of 
more definite identification of the 
stimulus. 

At the outset of the paper, the 
initial question asked was whether or 
not sensory preconditioning required 
independent consideration as a phe- 
nomenon in learning. Although the 
data are by no means exhaustive, 
they do suggest that it tentatively 
does require such consideration. In 
addition, regarding the second ques- 
tion of the pertinent laws for sensory 
preconditioning, whether the param- 
eters of preconditioning and S-R 


learning differ or are the same should 
be further investigated in the par- 
adigm specified above. However, it 
is apparent at this point that the role 
of the response in SPC is a minor one 
(Bahrick, 1953; Coppock, 1958; 
Seidel, 1958). 

Concerning the third question of 
the problems posed for the learning 
theorist, one issue clearly defined at 
present is that reinforcement as clas- 
sically understood (Hull, 1943) seems 
to be an unnecessary condition for 
SPC to be effective. The value of 
this contribution is not to be under- 
estimated. Indeed, the very concept 
of reinforcement (drive reduction) as 
developed by Hull and his supporters 
has been the center of a major con- 
troversy in learning theory for many 
years. To this end, the sensory pre- 
conditioning research has _ proved 


fruitful. This point is epitomized by 
the fact that Osgood (1953), an S-R 
reinforcement theorist, has conceded 
that the SPC data provide a strong 


case for the elimination of reinforce- 
ment as a necessary condition for 
learning. Further, if the writer’s 
autonomic interpretation of Osgood’s 
mediational analysis of learning is 
correct, the SPC data seem to pose 
difficulties for the latter’s peripheral 
mediation hypothesis. At present, a 
more tenable approach would be the 
Hebbian-type analysis proposed by 
the writer or the 5-S view offered by 
Birch and Bitterman. Finally, if one 
entertains the possibility for two- 
factor learning, a systematic com- 
parison of MSG and SPC, which at 
present seem to reflect different proc- 
esses, should prove fruitful for learn- 
ing theory. Thus, granted the need 
for more research, the sensory pre- 
conditioning paradigm already seems 
to have provided a valuable building 
block for the theoretical development 
of psychology. 
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A NEW PSYCHOPHYSICAL METHOD: METHOD OF TRANS- 
POSITION OR EQUAL-APPEARING RELATIONS 


TADASU OYAMA 
Hokkaido University, Japan 


In Fig. 1 the right half of the 
straight line appears longer than the 
left half, though they are physically 
of the same length; this is the well- 
known Miiller-Lyer illusion. To de- 
termine the amount of illusion, one 
of the traditional psychophysical 
methods is to let the observer adjust 
the length of the right or left half of 
the line until it appears equal to the 
other half, and the difference in 
length between the two halves is sup- 
posed to represent the amount of il- 
lusion. 

But does this procedure give the 
true measure of illusion? When the 
adjustment is complete, the stimulus 
pattern is no longer the same as the 
original; two halves of the line after 
adjustment are different in length, 
whereas in the original Miiller-Lyer 
figure they are the same. In other 
words, the very operation of meas- 
urement changes the stimulus pat- 
tern from Miiller-Lyer figure to some- 
thing else. And we have no assurance 
that what we measure by this method 
is the amount of illusion as it exists 
in the original stimulus pattern. 

There are many other classical 
psychophysical methods besides the 
method of adjustment illustrated 
here, but they are alike in that a pair 
of equivalent stimuli is sought for 
measuring the amount of illusions, 
and the measuring operation inevi- 
tably alters the stimulus pattern. It 
would certainly be preferable to have 
a method of measurement which can 
be applied to the stimulus pattern 
without destroying it. To meet this 
demand, a new psychophysical meth- 
od has been devised in which the 
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Fic. 1. MULLER:LYER ILLUSION FIGURE. 


original stimulus pattern is left intact 
while the apparent relation between 
its stimulus parts is measured. The 
observer is asked to find the neutral 
comparison pattern which has the 
same apparent: relation between its 
parts as the original stimulus pattern 
has. In other words, the apparent 
relation between the standard pair 
of stimuli is transposed to the com- 
parison pair just as a melody is trans- 
posed from C major to D major. For 
this reason, we may name the new 
method the method of transposition 
or the method of equal-appearing rela- 
tions. This method will be illustrated 
and discussed in the following sec- 
tions. 


SoME APPLICATIONS OF THE 
New METHOD 

Miiller-Lyer Illusion 

Oyama (1955) used a “‘transposi- 
tion” pattern shown in Fig. 2 and 
instructed his Ss to adjust the right 
half until it had the same apparent 
ratio to the constant left half as the 
ratio of the corresponding halves of 
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Fic. 2. THE TRANSPOSITION PATTERN USED 
IN THE EXPERIMENT OF MULLER-LYER ILLU- 
sION. (REDRAWN FROM Oyama [1955]) 








A NEW PSYCHOPHYSICAL METHOD 75 


ca 


CONTR. 


ILLUSION =x 
— RS a 
! we: Eee ath 


—+—_1_ 1 


(60°) (3cm) 





15 3 45cm 30°60’ 120° 
LENGTH ANGLE 


Fic. 3. MULLER-LYER ILLUSION AS A FuNC- 
TION OF THE LENGTH AND THE ANGLE OF ArR- 
rows. The Solid Lines Represent the Results 
of the Method of Transposition and the 
Dotted Lines Show the Results of the Method 
of Adjustment. 


the Miiller-Lyer figure in Fig. 1. 
Seven Ss served under five condi- 
tions employing stimuli of various 
angles and lengths of arrows. The Ss 
made the matches in the traditional 
way and also according to the new 
procedure advocated here. 

Results are shown in Fig. 3. There 
are considerable discrepancies be- 
tween the results of the new and the 
traditional methods. This is only 
natural because the two methods 
measure different stimulus patterns 
as pointed out above. We may say 
that the results of the new method 
represent the Miiller-Lyer illusion 
more directly than do the results of 
the old method though more syste- 
matic analysis of this phenomenon is 
desirable. That the stimulus pattern 
is left intact in this method is its 
major advantage over traditional 
methods. 


Figural After-Effects 


If, immediately after inspection of 
a circle, another circle is presented 
at the same place in the visual field, 
the second circle appears smaller or 
larger than a third circle which is 
physically the same size as the second 
circle but is shown at a neutral place. 
This phenomenon has been named 
figural after-effect. In the traditional 


method the third circle is used as the 
variable comparison stimulus by 
means of which the amount of figural 
after-effect is measured (Oyama, 
1954; Sagara & Oyama, 1957). Our 
new method can also replace this 
method of measuring figural after- 
effects. 

Oyama (1955) prepared 10 inspec- 
tion cards, on each of which was 
drawn a circle of variable size on the 
right side of a fixation mark. On the 
test card were two circles of the same 
size on the right and left sides of a 
fixation mark. In addition to these 
cards, he made a series of ‘“‘trans- 
position’’ cards, on each of which 
were drawn a circle of constant di- 
ameter on the left side and another 
circle on the right whose diameter 
was varied by 1 mm. steps. Subjects 
were required to memorize the rela- 
tion between the apparent sizes of 
the two circles in the test card at a 
moment immediately after the in- 
spection, and to choose from ‘‘trans- 
position” cards in his hand a card 
which had the same apparent rela- 
tion between its two circles as the cir- 
cles had on the test card. 

Results are shown in Fig. 4 for 
both the new and the traditional 
methods with the same five Ss. The 
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Fic, 4. FiGguRAL AFTEREFFECT AS A FuNC- 
TION OF THE S1ZE OF INSPECTION-CIRCLE. The 
solid line represents the results of the method 
of transposition and the dotted line indicates 
the results of the method of constant stimuli. 
The diameter of test-circle was 4 cm in these 
experiments. (Redrawn from Oyama [1955]) 
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two curves in this figure are similar 
enough to be substituted for one an- 
other. To obtain practically the same 
results, however, the new method re- 
quired only two experimental ses- 
sions of about a half hour for each S, 
whereas the traditional method re- 
quired 10 sessions. This means that 
the new method gave more informa- 
tion per trial than did the traditional 
method and this may be counted as 
one of the advantages of the former 
over the latter. 

Wertheimer (1954) had already 
used practically the same procedure 
in his study of figural after-effect, 
but the steps between transposition 
stimuli seem to me to have been too 
large in his procedure. 


Size Constancy 


In the traditional experiment on 
size constancy, the experimenter pre- 
sents two circles at different dis- 
tances from the observer. One circle 
has a constant size (the standard) 
and is placed at various distances, 
while the other circle is varied in 
size and is presented at a constant 
distance (the variable). The physi- 
cal size of the variable which appears 
equal to the standard is determined 
as a function of the distance of the 
standard. 

Makino (1956) employed the 
method of transposition instead of 
the traditional method in his study 
of size constancy. He presented two 
physically equal circles, one at a 
constant distance and the other at 
various distances. Subjects observed 
the apparent size ratio between the 
two circles and compared it with 
ratios of circles on ‘‘transposition”’ 
cards in their hands. His results were 
very clear-cut as shown in Fig. 5. The 
logarithm of perceived size is linearly 
related to the logarithm of the physi- 
cal distance. Results obtained with 
binocular observation were fitted by 
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Fic. 5. PERCEIVED SIZE AS A FUNCTION OF 
THE OBSERVATION DIsTANCE. The Solid Line 
Indicates the Results of Binocular Observa- 
tion and the Dotted Line Represents Those 
of Monocular Observation. (Redrawn from 
Makino [1956]) 


the equation: s=1.68D~-°-5 and the 
results with monocular observation 
by the equation: s=1.77D~-°*8, in 
which s indicates the perceived size, 
i.e., the matched size in the trans- 
position cards, and D represents the 
observation distance. He recom- 
mended the new method because it 
made the task of the observer easier 
and reduced intra- and_inter-ob- 
server variability. 

Some years ago, Ogasawara (1935) 
studied size constancy in stereoscopic 
vision. He used a photographic stere- 
ogram of two white balls on a table. 
From observers’ verbal reports, he 
analyzed the effects of several factors 
on the phenomenal size of the balls. 
Since the classical psychophysical 
method was hardly useful in such a 
situation, his analysis had to remain 
at the qualitative level. Recently 
Oyama and Teraoka (1955) suc- 
ceeded in adding quantitative an- 
alysis to this interesting study using 
the method of transposition. Their 
analysis revealed positive correla- 
tions between the perceived size and 
the perceived distance and also be- 
tween each of these and the S’s 
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stereoscopic ability as measured by 
the Umezu-Shimazu test (1952). 


SoME THEORETICAL 
CONSIDERATIONS 


It has been frequently argued that 
the responses in the orthodox psy- 
chophysical methods ought to be re- 
stricted to the two categories of yes 
and no, or to the three categories of 
“‘soreater,’’ “‘less,”’ and ‘‘doubtful.”’ 
This principle holds in the new 
method as well. The observer com- 
pares the standard pair of stimuli 
with the ‘transposition’ (compari- 
son) pair of stimuli, and judges 
whether the apparent ratio or differ- 
ence between the two components in 
the former pair is larger or smaller 
than that in the latter. In some cases, 
he reports such judgments explicitly, 
and, in other cases, he makes the ad- 
justments or choices of the transposi- 
tion stimuli according to his implicit 
judgments. In these respects, the 


new method has nothing new, except 
that the judgments are concerned 
with the apparent ratio or difference 
in each pair rather than the apparent 
magnitude on each stimulus. 
Transposition is the sole new re- 


quirement in the new method. The 
observer is asked to transpose the 
phenomenal relation, i.e., the appar- 
ent ratio, difference, or some other 
relation from one stimulus pattern 
to another. Gestalt psychologists 
claimed that man is able to transpose 
phenomenal relations in many per- 
ceptual functions just as he can trans- 
pose a melody (Kéhler, 1929). For 
example, a visual figure remains the 
same in shape, regardless of its 
brightness, location, or size just as a 
melody remains the same when it is 
played in different keys. Die Trans- 
ponierbarkeit, the possibility of trans- 
position, in the perceptual dimen- 
sion, is the only assumption made in 
our method. There is no other re- 


quirement such as fractionation, mul- 
tiplication, or direct estimation of 
apparent magnitude of stimuli (Ste- 
vens, 1957). 

The psychophysical methods may 
be classified in many ways. They 
may be classified according to the 
mode of presentation or control of 
stimuli, as in the method of adjust- 
ment, the method of limits, or the 
method of constant stimuli. Or, they 
may be classified according to the ob- 
ject to be measured, as in the method 
of just noticeable differences, the 
method of equivalents, or the method 
of equal sense distances. Our new 
method may be called the method of 
equal-appearing relations by the lat- 
ter criterion, but it does not belong 
in the former classification since it 
bears no intrinsic relationship to any 
method of presentation or control of 
stimuli. Sometimes it is combined 
with the method of adjustment as il- 
lustrated in treatment of the Miiller- 
Lyer illusion, and sometimes with the 
method of constant stimuli, as in 
Makino’s experiment on size con- 
stancy. It may well be combined 
with the method of limits or other 
psychophysical procedures. 

If we do not wish to use the ambig- 
uous term “‘relation,’’ we may call 
the new method the method of equal- 
appearing differentes, or the method 
of equal-appearing ratios as the case 
may be. However, such names should 
not imply psychological interval or 
ratio scales as the basis of measure- 
ment in contrast with the method of 
bisection or the method of fractiona- 
tion (Stevens, 1951). In our method, 
the measurements are made on the 
physical scales of the transposition 
stimuli and the results are repre- 
sented in physical units. 

If we interpret ‘the method of 
transposition” or “the method of 
equal-appearing relations’ in a 
broader sense, some of the traditional 
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methods may be included in this 
method. A group of psychophysical 
methods which are called the method 
of equal sense distances, the method 
of equal-appearing intervals, or the 
method of bisection are in a sense 
variations of the method of equal- 
appearing relations, if we under- 
stand sense distance, interval, etc., as 
a relation between two stimulus lim- 
its. When we seek by these methods 
the brightness x which appears half- 
way between two brightnesses a and 
b, we equate the apparent interval 
between a and x to that of x and b. 
In other words, we transpose the 
relation between a and x to the rela- 
tion between x and b. In this sense, 
these methods are variations of the 
method of equal-appearing relations 
or the method of transposition. 

More recently, some investigators 
have made intermodal transposi- 
tions of apparent ratios. For in- 
stance, J. C. Stevens asked his Ss to 
adjust the loudness of the second of 
two noises so that the apparent 
ratio of the two sounds equaled the 
apparent brightness ratio between 
two luminous targets (Stevens, 1957). 
This procedure should be called the 
method of equal-appearing ratios, 
and consequently it is a kind of 
method of equal-appearing relations. 

When one of the perceptual dimen- 
sions in the intermodal transposition 
is the apparent length of a straight 
line and the observer is instructed to 
mark a point dividing the line into 
two parts whose ratio appears equal 
to that of two magnitudes in another 
psychological dimension, the situa- 
tion becomes just the same as that of 
a continuous rating scale. Therefore, 
it is also possible to classify such rat- 
ing methods with the method of 
transposition in a broad sense. 

Even such a group as the so-called 
method of iractionation (Geiger & 
Firestone, 1933), the method of mul- 


tiplication (Hanes, 1949), and the 
constant sum method (Metfessel, 
1947) could be regarded as examples 
of our method. In these methods, the 
S is required to translate the appar- 
ent ratio between two given stimuli 
into a number, or to divide a number 
to express a perceived ratio. Num- 
bers constitute a ratio scale in mathe- 
matics, but we are not sure yet if 
they constitute a ratio scale in the 
psychological world as well. Accord- 
ingly it is better to understand such 
“translation” as an intermodal trans- 
position between two perceptual di- 
mensions, One corresponding to the 
stimulus continuum, and the other 
to the verbally expressed numbers. 


SUMMARY 


In many experiments dealing with 
perceptual phenomena, investigators 
try to find a stimulus which appears 
equal toa standard stimulus. It often 
happens, however, that the procedure 
involved in finding the equivalent 
stimulus alters the stimulus pattern 
so that the measurement is not made 
on the original stimulus pattern. To 
avoid this difficulty, a new psycho- 
physical method, which is named the 
method of transposition or the 
method of equal-appearing relations, 
was proposed. The major advantage 
of this method over the traditional 
ones is that it leaves the original 
stimulus pattern intact.: 

In this new method, the apparent 
relation between the components of 
a stimulus pattern is observed and 
the same relation is sought in a 
neutral stimulus pattern. In other 
words, the apparent relation is trans- 
posed to a different group of stimulus 
elements, just as a melody is trans- 
posed in different keys. 

This method was applied to the 
measurement of the Miiller-Lyer illu- 
sion, figural after-effect, and size 
constancy, and the results revealed 
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that the new method had additional 
advantages over traditional methods, 
viz., greater amount of information 
per trial, economy of time, and light- 
ening the task of the S. 

This method will find its most use- 
ful application in the measurement 
of strongly structured, brief, and 


hard to reproduce or continuously 

changing perceptual phenomena. 
The new method requires no more 

ability on the part of Ss than to use 


the same categories that are used in 
traditional psychophysical methods. 
It makes no assumption of underly- 
ing interval or ratio scales. In these 
respects, it is not different from the 
traditional methods. 

If we interpret the method of trans- 
position in a broad sense, it includes 
the method of equal sense distances, 
some rating scale methods, and also 
so-called ratio estimation or ratio 
production methods. 
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ERRATUM 


In the article ‘‘A Learning Theory Approach to Research in Schizophrenia” 
by Sarnoff A. Mednick (Psychol. Bull., 1958, 55, 316-327), Reference 17 
should read: Dunn, W. L., JR. Visual discrimination of schizophrenic sub- 
jects as a function of stimulus meaning. J. Pers., 1954, 23, 48-64. 
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